diff options
author | Tamara Dahlgren <35777542+tldahlgren@users.noreply.github.com> | 2020-02-19 00:04:22 -0800 |
---|---|---|
committer | GitHub <noreply@github.com> | 2020-02-19 00:04:22 -0800 |
commit | f2aca86502eded1500489cd13799d8826e6fc9d2 (patch) | |
tree | b6420e0db1471646d0f2cb94d138d1056478fd50 /etc | |
parent | 2f4881d582181b275d13ad2098a3c89b563e9f97 (diff) | |
download | spack-f2aca86502eded1500489cd13799d8826e6fc9d2.tar.gz spack-f2aca86502eded1500489cd13799d8826e6fc9d2.tar.bz2 spack-f2aca86502eded1500489cd13799d8826e6fc9d2.tar.xz spack-f2aca86502eded1500489cd13799d8826e6fc9d2.zip |
Distributed builds (#13100)
Fixes #9394
Closes #13217.
## Background
Spack provides the ability to enable/disable parallel builds through two options: package `parallel` and configuration `build_jobs`. This PR changes the algorithm to allow multiple, simultaneous processes to coordinate the installation of the same spec (and specs with overlapping dependencies.).
The `parallel` (boolean) property sets the default for its package though the value can be overridden in the `install` method.
Spack's current parallel builds are limited to build tools supporting `jobs` arguments (e.g., `Makefiles`). The number of jobs actually used is calculated as`min(config:build_jobs, # cores, 16)`, which can be overridden in the package or on the command line (i.e., `spack install -j <# jobs>`).
This PR adds support for distributed (single- and multi-node) parallel builds. The goals of this work include improving the efficiency of installing packages with many dependencies and reducing the repetition associated with concurrent installations of (dependency) packages.
## Approach
### File System Locks
Coordination between concurrent installs of overlapping packages to a Spack instance is accomplished through bottom-up dependency DAG processing and file system locks. The runs can be a combination of interactive and batch processes affecting the same file system. Exclusive prefix locks are required to install a package while shared prefix locks are required to check if the package is installed.
Failures are communicated through a separate exclusive prefix failure lock, for concurrent processes, combined with a persistent store, for separate, related build processes. The resulting file contains the failing spec to facilitate manual debugging.
### Priority Queue
Management of dependency builds changed from reliance on recursion to use of a priority queue where the priority of a spec is based on the number of its remaining uninstalled dependencies.
Using a queue required a change to dependency build exception handling with the most visible issue being that the `install` method *must* install something in the prefix. Consequently, packages can no longer get away with an install method consisting of `pass`, for example.
## Caveats
- This still only parallelizes a single-rooted build. Multi-rooted installs (e.g., for environments) are TBD in a future PR.
Tasks:
- [x] Adjust package lock timeout to correspond to value used in the demo
- [x] Adjust database lock timeout to reduce contention on startup of concurrent
`spack install <spec>` calls
- [x] Replace (test) package's `install: pass` methods with file creation since post-install
`sanity_check_prefix` will otherwise error out with `Install failed .. Nothing was installed!`
- [x] Resolve remaining existing test failures
- [x] Respond to alalazo's initial feedback
- [x] Remove `bin/demo-locks.py`
- [x] Add new tests to address new coverage issues
- [x] Replace built-in package's `def install(..): pass` to "install" something
(i.e., only `apple-libunwind`)
- [x] Increase code coverage
Diffstat (limited to 'etc')
-rw-r--r-- | etc/spack/defaults/config.yaml | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/etc/spack/defaults/config.yaml b/etc/spack/defaults/config.yaml index 3aadccfda1..32745a309a 100644 --- a/etc/spack/defaults/config.yaml +++ b/etc/spack/defaults/config.yaml @@ -137,7 +137,7 @@ config: # when Spack needs to manage its own package metadata and all operations are # expected to complete within the default time limit. The timeout should # therefore generally be left untouched. - db_lock_timeout: 120 + db_lock_timeout: 3 # How long to wait when attempting to modify a package (e.g. to install it). |