summaryrefslogtreecommitdiff
path: root/lib/spack/docs/pipelines.rst
diff options
context:
space:
mode:
authorScott Wittenburg <scott.wittenburg@kitware.com>2020-01-21 23:35:18 -0700
committerTodd Gamblin <tgamblin@llnl.gov>2020-01-21 22:35:18 -0800
commit8283d87f6a1a7ea2e92e9adfb7ac42ce94a6e4d5 (patch)
tree68b19321b3676b1ab17bcf8ac67fac34bacde395 /lib/spack/docs/pipelines.rst
parent4d794d63b5ac3c667446c74d367fe4eb7f1e2caf (diff)
downloadspack-8283d87f6a1a7ea2e92e9adfb7ac42ce94a6e4d5.tar.gz
spack-8283d87f6a1a7ea2e92e9adfb7ac42ce94a6e4d5.tar.bz2
spack-8283d87f6a1a7ea2e92e9adfb7ac42ce94a6e4d5.tar.xz
spack-8283d87f6a1a7ea2e92e9adfb7ac42ce94a6e4d5.zip
pipelines: `spack ci` command with env-based workflow (#12854)
Rework Spack's continuous integration workflow to be environment-based. - Add the `spack ci` command, which replaces the many scripts in `bin/` - `spack ci` decouples the CI workflow from the spack instance: - CI is defined in a spack environment - environment is in its own (single) git repository, separate from Spack - spack instance used to run the pipeline is up to the user - A new `gitlab-ci` section in environments allows users to configure how specs in the environment should be mapped to runners - Compilers can be bootstrapped in the new pipeline workflow - Add extensive documentation on pipelines (see `pipelines.rst` for further details) - Add extensive tests for pipeline code
Diffstat (limited to 'lib/spack/docs/pipelines.rst')
-rw-r--r--lib/spack/docs/pipelines.rst439
1 files changed, 439 insertions, 0 deletions
diff --git a/lib/spack/docs/pipelines.rst b/lib/spack/docs/pipelines.rst
new file mode 100644
index 0000000000..f70b39a16d
--- /dev/null
+++ b/lib/spack/docs/pipelines.rst
@@ -0,0 +1,439 @@
+.. Copyright 2013-2019 Lawrence Livermore National Security, LLC and other
+ Spack Project Developers. See the top-level COPYRIGHT file for details.
+
+ SPDX-License-Identifier: (Apache-2.0 OR MIT)
+
+.. _pipelines:
+
+=========
+Pipelines
+=========
+
+Spack provides commands that support generating and running automated build
+pipelines designed for Gitlab CI. At the highest level it works like this:
+provide a spack environment describing the set of packages you care about,
+and include within that environment file a description of how those packages
+should be mapped to Gitlab runners. Spack can then generate a ``.gitlab-ci.yml``
+file containing job descriptions for all your packages that can be run by a
+properly configured Gitlab CI instance. When run, the generated pipeline will
+build and deploy binaries, and it can optionally report to a CDash instance
+regarding the health of the builds as they evolve over time.
+
+------------------------------
+Getting started with pipelines
+------------------------------
+
+It is fairly straightforward to get started with automated build pipelines. At
+a minimum, you'll need to set up a Gitlab instance (more about Gitlab CI
+`here <https://about.gitlab.com/product/continuous-integration/>`_) and configure
+at least one `runner <https://docs.gitlab.com/runner/>`_. Then the basic steps
+for setting up a build pipeline are as follows:
+
+#. Create a repository on your gitlab instance
+#. Add a ``spack.yaml`` at the root containing your pipeline environment (see
+ below for details)
+#. Add a ``.gitlab-ci.yml`` at the root containing a single job, similar to
+ this one:
+
+ .. code-block:: yaml
+
+ pipeline-job:
+ tags:
+ - <custom-tag>
+ ...
+ script:
+ - spack ci start
+
+#. Add any secrets required by the CI process to environment variables using the
+ CI web ui
+#. Push a commit containing the ``spack.yaml`` and ``.gitlab-ci.yml`` mentioned above
+ to the gitlab repository
+
+The ``<custom-tag>``, above, is used to pick one of your configured runners,
+while the use of the ``spack ci start`` command implies that runner has an
+appropriate version of spack installed and configured for use. Of course, there
+are myriad ways to customize the process. You can configure CDash reporting
+on the progress of your builds, set up S3 buckets to mirror binaries built by
+the pipeline, clone a custom spack repository/ref for use by the pipeline, and
+more.
+
+While it is possible to set up pipelines on gitlab.com, the builds there are
+limited to 60 minutes and generic hardware. It is also possible to
+`hook up <https://about.gitlab.com/blog/2018/04/24/getting-started-gitlab-ci-gcp>`_
+Gitlab to Google Kubernetes Engine (`GKE <https://cloud.google.com/kubernetes-engine/>`_)
+or Amazon Elastic Kubernetes Service (`EKS <https://aws.amazon.com/eks>`_), though those
+topics are outside the scope of this document.
+
+-----------------------------------
+Spack commands supporting pipelines
+-----------------------------------
+
+Spack provides a command `ci` with sub-commands for doing various things related
+to automated build pipelines. All of the ``spack ci ...`` commands must be run
+from within a environment, as each one makes use of the environment for different
+purposes. Additionally, some options to the commands (or conditions present in
+the spack environment file) may require particular environment variables to be
+set in order to function properly. Examples of these are typically secrets
+needed for pipeline operation that should not be visible in a spack environment
+file. These environment variables are described in more detail
+:ref:`ci_environment_variables`.
+
+.. _cmd_spack_ci:
+
+^^^^^^^^^^^^^^^^^^
+``spack ci``
+^^^^^^^^^^^^^^^^^^
+
+Super-command for functionality related to generating pipelines and executing
+pipeline jobs.
+
+.. _cmd_spack_ci_start:
+
+^^^^^^^^^^^^^^^^^^
+``spack ci start``
+^^^^^^^^^^^^^^^^^^
+
+Currently this command is a short-cut to first run ``spack ci generate``, followed
+by ``spack ci pushyaml``.
+
+.. _cmd_spack_ci_generate:
+
+^^^^^^^^^^^^^^^^^^^^^
+``spack ci generate``
+^^^^^^^^^^^^^^^^^^^^^
+
+Concretizes the specs in the active environment, stages them (as described in
+:ref:`staging_algorithm`), and writes the resulting ``.gitlab-ci.yml`` to disk.
+
+.. _cmd_spack_ci_pushyaml:
+
+^^^^^^^^^^^^^^^^^^^^^
+``spack ci pushyaml``
+^^^^^^^^^^^^^^^^^^^^^
+
+Generates a commit containing the generated ``.gitlab-ci.yml`` and pushes it to a
+``DOWNSTREAM_CI_REPO``, which is frequently the same repository. The branch
+created has the same name as the current branch being tested, but has ``multi-ci-``
+prepended to the branch name. Once Gitlab CI has full support for dynamically
+defined workloads, this command will be deprecated.
+
+.. _cmd_spack_ci_rebuild:
+
+^^^^^^^^^^^^^^^^^^^^
+``spack ci rebuild``
+^^^^^^^^^^^^^^^^^^^^
+
+This sub-command is responsible for ensuring a single spec from the release
+environment is up to date on the remote mirror configured in the environment,
+and as such, corresponds to a single job in the ``.gitlab-ci.yml`` file.
+
+------------------------------------
+A pipeline-enabled spack environment
+------------------------------------
+
+Here's an example of a spack environment file that has been enhanced with
+sections desribing a build pipeline:
+
+.. code-block:: yaml
+
+ spack:
+ definitions:
+ - pkgs:
+ - readline@7.0
+ - compilers:
+ - '%gcc@5.5.0'
+ - oses:
+ - os=ubuntu18.04
+ - os=centos7
+ specs:
+ - matrix:
+ - [$pkgs]
+ - [$compilers]
+ - [$oses]
+ mirrors:
+ cloud_gitlab: https://mirror.spack.io
+ gitlab-ci:
+ mappings:
+ - match:
+ - os=ubuntu18.04
+ runner-attributes:
+ tags:
+ - spack-k8s
+ image: spack/spack_builder_ubuntu_18.04
+ - match:
+ - os=centos7
+ runner-attributes:
+ tags:
+ - spack-k8s
+ image: spack/spack_builder_centos_7
+ cdash:
+ build-group: Release Testing
+ url: https://cdash.spack.io
+ project: Spack
+ site: Spack AWS Gitlab Instance
+
+Hopefully, the ``definitions``, ``specs``, ``mirrors``, etc. sections are already
+familiar, as they are part of spack :ref:`environments`. So let's take a more
+in-depth look some of the pipeline-related sections in that environment file
+that might not be as familiar.
+
+The ``gitlab-ci`` section is used to configure how the pipeline workload should be
+generated, mainly how the jobs for building specs should be assigned to the
+configured runners on your instance. Each entry within the list of ``mappings``
+corresponds to a known gitlab runner, where the ``match`` section is used
+in assigning a release spec to one of the runners, and the ``runner-attributes``
+section is used to configure the spec/job for that particular runner.
+
+There are other pipeline options you can configure within the ``gitlab-ci`` section
+as well. The ``bootstrap`` section allows you to specify lists of specs from
+your ``definitions`` that should be staged ahead of the environment's ``specs`` (this
+section is described in more detail below). The ``enable-artifacts-buildcache`` key
+takes a boolean and determines whether the pipeline uses artifacts to store and
+pass along the buildcaches from one stage to the next (the default if you don't
+provide this option is ``False``). The ``enable-debug-messages`` key takes a boolean
+and allows you to choose whether the pipeline build jobs are run as ``spack -d ci rebuild``
+or just ``spack ci rebuild`` (the default is not to enable debug messages). The
+``final-stage-rebuild-index`` section controls whether an extra job is added to the
+end of your pipeline (in a stage by itself) which will regenerate the mirror's
+buildcache index. Under normal operation, each pipeline job that rebuilds a package
+will re-generate the mirror's buildcache index after the buildcache entry for that
+job has been created and pushed to the mirror. Since jobs in the same stage can run in
+parallel, there is the possibility that at the end of some stage, the index may not
+reflect all the binaries in the buildcache. Adding the ``final-stage-rebuild-index``
+section ensures that at the end of the pipeline, the index will be in sync with the
+binaries on the mirror. If the mirror lives in an S3 bucket, this job will need to
+run on a machine with the Python ``boto3`` module installed, and consequently the
+``final-stage-rebuild-index`` needs to specify a list of ``tags`` to pick a runner
+satisfying that condition. It can also take an ``image`` key so Docker executor type
+runners can pick the right image for the index regeneration job.
+
+The optional ``cdash`` section provides information that will be used by the
+``spack ci generate`` command (invoked by ``spack ci start``) for reporting
+to CDash. All the jobs generated from this environment will belong to a
+"build group" within CDash that can be tracked over time. As the release
+progresses, this build group may have jobs added or removed. The url, project,
+and site are used to specify the CDash instance to which build results should
+be reported.
+
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Assignment of specs to runners
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``mappings`` section corresponds to a list of runners, and during assignment
+of specs to runners, the list is traversed in order looking for matches, the
+first runner that matches a release spec is assigned to build that spec. The
+``match`` section within each runner mapping section is a list of specs, and
+if any of those specs match the release spec (the ``spec.satisfies()`` method
+is used), then that runner is considered a match.
+
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Configuration of specs/jobs for a runner
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Once a runner has been chosen to build a release spec, the ``runner-attributes``
+section provides information determining details of the job in the context of
+the runner. The ``runner-attributes`` section must have a ``tags`` key, which
+is a list containing at least one tag used to select the runner from among the
+runners known to the gitlab instance. For Docker executor type runners, the
+``image`` key is used to specify the Docker image used to build the release spec
+(and could also appear as a dictionary with a ``name`` specifying the image name,
+as well as an ``entrypoint`` to override whatever the default for that image is).
+For other types of runners the ``variables`` key will be useful to pass any
+information on to the runner that it needs to do its work (e.g. scheduler
+parameters, etc.).
+
+.. _staging_algorithm:
+
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Summary of ``.gitlab-ci.yml`` generation algorithm
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+All specs yielded by the matrix (or all the specs in the environment) have their
+dependencies computed, and the entire resulting set of specs are staged together
+before being run through the ``gitlab-ci/mappings`` entries, where each staged
+spec is assigned a runner. "Staging" is the name we have given to the process
+of figuring out in what order the specs should be built, taking into consideration
+Gitlab CI rules about jobs/stages. In the staging process the goal is to maximize
+the number of jobs in any stage of the pipeline, while ensuring that the jobs in
+any stage only depend on jobs in previous stages (since those jobs are guaranteed
+to have completed already). As a runner is determined for a job, the information
+in the ``runner-attributes`` is used to populate various parts of the job
+description that will be used by Gitlab CI. Once all the jobs have been assigned
+a runner, the ``.gitlab-ci.yml`` is written to disk.
+
+The short example provided above would result in the ``readline``, ``ncurses``,
+and ``pkgconf`` packages getting staged and built on the runner chosen by the
+``spack-k8s`` tag. In this example, we assume the runner is a Docker executor
+type runner, and thus certain jobs will be run in the ``centos7`` container,
+and others in the ``ubuntu-18.04`` container. The resulting ``.gitlab-ci.yml``
+will contain 6 jobs in three stages. Once the jobs have been generated, the
+presence of a ``SPACK_CDASH_AUTH_TOKEN`` environment variable during the
+``spack ci generate`` command would result in all of the jobs being put in a
+build group on CDash called "Release Testing" (that group will be created if
+it didn't already exist).
+
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Optional compiler bootstrapping
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Spack pipelines also have support for bootstrapping compilers on systems that
+may not already have the desired compilers installed. The idea here is that
+you can specify a list of things to bootstrap in your ``definitions``, and
+spack will guarantee those will be installed in a phase of the pipeline before
+your release specs, so that you can rely on those packages being available in
+the binary mirror when you need them later on in the pipeline. At the moment
+the only viable use-case for bootstrapping is to install compilers.
+
+Here's an example of what bootstrapping some compilers might look like:
+
+.. code-block:: yaml
+
+ spack:
+ definitions:
+ - compiler-pkgs:
+ - 'llvm+clang@6.0.1 os=centos7'
+ - 'gcc@6.5.0 os=centos7'
+ - 'llvm+clang@6.0.1 os=ubuntu18.04'
+ - 'gcc@6.5.0 os=ubuntu18.04'
+ - pkgs:
+ - readline@7.0
+ - compilers:
+ - '%gcc@5.5.0'
+ - '%gcc@6.5.0'
+ - '%gcc@7.3.0'
+ - '%clang@6.0.0'
+ - '%clang@6.0.1'
+ - oses:
+ - os=ubuntu18.04
+ - os=centos7
+ specs:
+ - matrix:
+ - [$pkgs]
+ - [$compilers]
+ - [$oses]
+ exclude:
+ - '%gcc@7.3.0 os=centos7'
+ - '%gcc@5.5.0 os=ubuntu18.04'
+ gitlab-ci:
+ bootstrap:
+ - name: compiler-pkgs
+ compiler-agnostic: true
+ mappings:
+ # mappings similar to the example higher up in this description
+ ...
+
+In the example above, we have added a list to the ``definitions`` called
+``compiler-pkgs`` (you can add any number of these), which lists compiler packages
+we want to be staged ahead of the full matrix of release specs (which consists
+only of readline in our example). Then within the ``gitlab-ci`` section, we
+have added a ``bootstrap`` section, which can contain a list of items, each
+referring to a list in the ``definitions`` section. These items can either
+be a dictionary or a string. If you supply a dictionary, it must have a name
+key whose value must match one of the lists in definitions and it can have a
+``compiler-agnostic`` key whose value is a boolean. If you supply a string,
+then it needs to match one of the lists provided in ``definitions``. You can
+think of the bootstrap list as an ordered list of pipeline "phases" that will
+be staged before your actual release specs. While this introduces another
+layer of bottleneck in the pipeline (all jobs in all stages of one phase must
+complete before any jobs in the next phase can begin), it also means you are
+guaranteed your bootstrapped compilers will be available when you need them.
+
+The ``compiler-agnostic`` key can be provided with each item in the
+bootstrap list. It tells the ``spack ci generate`` command that any jobs staged
+from that particular list should have the compiler removed from the spec, so
+that any compiler available on the runner where the job is run can be used to
+build the package.
+
+When including a bootstrapping phase as in the example above, the result is that
+the bootstrapped compiler packages will be pushed to the binary mirror (and the
+local artifacts mirror) before the actual release specs are built. In this case,
+the jobs corresponding to subsequent release specs are configured to
+``install_missing_compilers``, so that if spack is asked to install a package
+with a compiler it doesn't know about, it can be quickly installed from the
+binary mirror first.
+
+Since bootstrapping compilers is optional, those items can be left out of the
+environment/stack file, and in that case no bootstrapping will be done (only the
+specs will be staged for building) and the runners will be expected to already
+have all needed compilers installed and configured for spack to use.
+
+-------------------------------------
+Using a custom spack in your pipeline
+-------------------------------------
+
+If your runners will not have a version of spack ready to invoke, or if for some
+other reason you want to use a custom version of spack to run your pipelines,
+this can be accomplished fairly simply. First, create CI environment variables
+containing the url and branch/tag you want to clone (calling them, for example,
+``SPACK_REPO`` and ``SPACK_REF``), use them to clone spack in your pre-ci
+``before_script``, and finally pass those same values along to the workload
+generation process via the ``spack-repo`` and ``spack-ref`` cli args. Here's
+an example:
+
+.. code-block:: yaml
+
+ pipeline-job:
+ tags:
+ - <some-other-tag>
+ before_script:
+ - git clone ${SPACK_REPO} --branch ${SPACK_REF}
+ - . ./spack/share/spack/setup-env.sh
+ script:
+ - spack ci start --spack-repo ${SPACK_REPO} --spack-ref ${SPACK_REF} <...args>
+ after_script:
+ - rm -rf ./spack
+
+If the ``spack ci start`` command receives those extra command line arguments,
+then it adds similar ``before_script`` and ``after_script`` sections for each of
+the ``spack ci rebuild`` jobs it generates (cloning and sourcing a custom
+spack in the ``before_script`` and removing it again in the ``after_script``).
+This gives you control over the version of spack used when the rebuild jobs
+are actually run on the gitlab runner.
+
+.. _ci_environment_variables:
+
+--------------------------------------------------
+Environment variables affecting pipeline operation
+--------------------------------------------------
+
+Certain secrets and some other information should be provided to the pipeline
+infrastructure via environment variables, usually for reasons of security, but
+in some cases to support other pipeline use cases such as PR testing. The
+environment variables used by the pipeline infrastructure are described here.
+
+^^^^^^^^^^^^^^^^^
+AWS_ACCESS_KEY_ID
+^^^^^^^^^^^^^^^^^
+
+Needed when binary mirror is an S3 bucket.
+
+^^^^^^^^^^^^^^^^^^^^^
+AWS_SECRET_ACCESS_KEY
+^^^^^^^^^^^^^^^^^^^^^
+
+Needed when binary mirror is an S3 bucket.
+
+^^^^^^^^^^^^^^^
+S3_ENDPOINT_URL
+^^^^^^^^^^^^^^^
+
+Needed when binary mirror is an S3 bucket that is *not* on AWS.
+
+^^^^^^^^^^^^^^^^^
+CDASH_AUTH_TOKEN
+^^^^^^^^^^^^^^^^^
+
+Needed in order to report build groups to CDash.
+
+^^^^^^^^^^^^^^^^^
+SPACK_SIGNING_KEY
+^^^^^^^^^^^^^^^^^
+
+Needed to sign/verify binary packages from the remote binary mirror.
+
+^^^^^^^^^^^^^^^^^^
+DOWNSTREAM_CI_REPO
+^^^^^^^^^^^^^^^^^^
+
+Needed until Gitlab CI supports dynamic job generation. Can contain connection
+credentials, and could be the same repository or a different one.