diff options
author | Massimiliano Culpo <massimiliano.culpo@gmail.com> | 2022-12-07 23:56:53 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-12-07 14:56:53 -0800 |
commit | ab6499ce1e4c8c6f043c52226b9eb50dc96ddfc9 (patch) | |
tree | 14c064f6d008e17d29af898233a7ef5f12eb17fb /LICENSE-MIT | |
parent | 412bec45aa2457aa181271afbafd912f9551ee20 (diff) | |
download | spack-ab6499ce1e4c8c6f043c52226b9eb50dc96ddfc9.tar.gz spack-ab6499ce1e4c8c6f043c52226b9eb50dc96ddfc9.tar.bz2 spack-ab6499ce1e4c8c6f043c52226b9eb50dc96ddfc9.tar.xz spack-ab6499ce1e4c8c6f043c52226b9eb50dc96ddfc9.zip |
parser: refactor with coarser token granularity (#34151)
## Motivation
Our parser grew to be quite complex, with a 2-state lexer and logic in the parser
that has up to 5 levels of nested conditionals. In the future, to turn compilers into
proper dependencies, we'll have to increase the complexity further as we foresee
the need to add:
1. Edge attributes
2. Spec nesting
to the spec syntax (see https://github.com/spack/seps/pull/5 for an initial discussion of
those changes). The main attempt here is thus to _simplify the existing code_ before
we start extending it later. We try to do that by adopting a different token granularity,
and by using more complex regexes for tokenization. This allow us to a have a "flatter"
encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer.
There are places, namely in `VERSION`, where we have to use negative lookahead judiciously
to avoid ambiguity. Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE`
and an extra final `\b` in `VERSION`:
```
@ 1.2.3 : develop # This is a version range 1.2.3:develop
@ 1.2.3 : develop=foo # This is a version range 1.2.3: followed by a key-value pair
```
## Differences with the previous parser
~There are currently 2 known differences with the previous parser, which have been added on purpose:~
- ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~
- ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~
~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~
The parser is currently 100% backward compatible.
## Error handling
Being based on more complex regexes, we can possibly improve error
handling by adding regexes for common issues and hint users on that.
I'll leave that for a following PR, but there's a stub for this approach in the PR.
## Performance
To be sure we don't add any performance penalty with this new encoding, I measured:
```console
$ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)"
```
for different specs on my machine:
* **Spack:** 0.20.0.dev0 (c9db4e50ba045f5697816187accaf2451cb1aae7)
* **Python:** 3.8.10
* **Platform:** linux-ubuntu20.04-icelake
* **Concretizer:** clingo
results are:
| Spec | develop | this PR |
| ------------- | ------------- | ------- |
| `trilinos` | 28.9 usec | 13.1 usec |
| `trilinos @1.2.10:1.4.20,2.0.1` | 131 usec | 120 usec |
| `trilinos %gcc` | 44.9 usec | 20.9 usec |
| `trilinos +foo` | 44.1 usec | 21.3 usec |
| `trilinos foo=bar` | 59.5 usec | 25.6 usec |
| `trilinos foo=bar ^ mpich foo=baz` | 120 usec | 82.1 usec |
so this new parser seems to be consistently faster than the previous one.
## Modifications
In this PR we just substituted the Spec parser, which means:
- [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py`
- [x] Added a new parser in `spack/parser.py`
- [x] Hooked the new parser in all the places the previous one was used
- [x] Adapted unit tests in `test/spec_syntax.py`
## Possible future improvements
Random thoughts while working on the PR:
- Currently we transform hashes and files into specs during parsing. I think
we might want to introduce an additional step and parse special objects like
a `FileSpec` etc. in-between parsing and concretization.
Diffstat (limited to 'LICENSE-MIT')
0 files changed, 0 insertions, 0 deletions