diff options
author | Adam J. Stewart <ajstewart426@gmail.com> | 2017-01-31 10:14:52 -0600 |
---|---|---|
committer | Todd Gamblin <tgamblin@llnl.gov> | 2017-01-31 11:14:52 -0500 |
commit | 123f057089547d79d6f308bc47698be936aa1cb5 (patch) | |
tree | e8a8e5b7da2e974a69d42126fb5a9bedb95c57d8 /lib | |
parent | 2e81fe4fb3a8982932f16c212f2a0732ee1766ea (diff) | |
download | spack-123f057089547d79d6f308bc47698be936aa1cb5.tar.gz spack-123f057089547d79d6f308bc47698be936aa1cb5.tar.bz2 spack-123f057089547d79d6f308bc47698be936aa1cb5.tar.xz spack-123f057089547d79d6f308bc47698be936aa1cb5.zip |
Refactor Spack's URL parsing commands (#2938)
* Replace `spack urls` and `spack url-parse` with `spack url`
* Allow spack url list to only list incorrect parsings
* Add spack url test reporting
* Add unit tests for new URL commands
Diffstat (limited to 'lib')
-rw-r--r-- | lib/spack/docs/developer_guide.rst | 102 | ||||
-rw-r--r-- | lib/spack/docs/packaging_guide.rst | 197 | ||||
-rw-r--r-- | lib/spack/spack/cmd/url.py | 319 | ||||
-rw-r--r-- | lib/spack/spack/cmd/url_parse.py | 79 | ||||
-rw-r--r-- | lib/spack/spack/cmd/urls.py | 59 | ||||
-rw-r--r-- | lib/spack/spack/test/cmd/url.py | 116 | ||||
-rw-r--r-- | lib/spack/spack/url.py | 237 |
7 files changed, 797 insertions, 312 deletions
diff --git a/lib/spack/docs/developer_guide.rst b/lib/spack/docs/developer_guide.rst index 5ddbaf2478..dbb9a670b4 100644 --- a/lib/spack/docs/developer_guide.rst +++ b/lib/spack/docs/developer_guide.rst @@ -300,6 +300,42 @@ Stage objects Writing commands ---------------- +Adding a new command to Spack is easy. Simply add a ``<name>.py`` file to +``lib/spack/spack/cmd/``, where ``<name>`` is the name of the subcommand. +At the bare minimum, two functions are required in this file: + +^^^^^^^^^^^^^^^^^^ +``setup_parser()`` +^^^^^^^^^^^^^^^^^^ + +Unless your command doesn't accept any arguments, a ``setup_parser()`` +function is required to define what arguments and flags your command takes. +See the `Argparse documentation <https://docs.python.org/2.7/library/argparse.html>`_ +for more details on how to add arguments. + +Some commands have a set of subcommands, like ``spack compiler find`` or +``spack module refresh``. You can add subparsers to your parser to handle +this. Check out ``spack edit --command compiler`` for an example of this. + +A lot of commands take the same arguments and flags. These arguments should +be defined in ``lib/spack/spack/cmd/common/arguments.py`` so that they don't +need to be redefined in multiple commands. + +^^^^^^^^^^^^ +``<name>()`` +^^^^^^^^^^^^ + +In order to run your command, Spack searches for a function with the same +name as your command in ``<name>.py``. This is the main method for your +command, and can call other helper methods to handle common tasks. + +Remember, before adding a new command, think to yourself whether or not this +new command is actually necessary. Sometimes, the functionality you desire +can be added to an existing command. Also remember to add unit tests for +your command. If it isn't used very frequently, changes to the rest of +Spack can cause your command to break without sufficient unit tests to +prevent this from happening. + ---------- Unit tests ---------- @@ -312,14 +348,80 @@ Unit testing Developer commands ------------------ +.. _cmd-spack-doc: + ^^^^^^^^^^^^^ ``spack doc`` ^^^^^^^^^^^^^ +.. _cmd-spack-test: + ^^^^^^^^^^^^^^ ``spack test`` ^^^^^^^^^^^^^^ +.. _cmd-spack-url: + +^^^^^^^^^^^^^ +``spack url`` +^^^^^^^^^^^^^ + +A package containing a single URL can be used to download several different +versions of the package. If you've ever wondered how this works, all of the +magic is in :mod:`spack.url`. This module contains methods for extracting +the name and version of a package from its URL. The name is used by +``spack create`` to guess the name of the package. By determining the version +from the URL, Spack can replace it with other versions to determine where to +download them from. + +The regular expressions in ``parse_name_offset`` and ``parse_version_offset`` +are used to extract the name and version, but they aren't perfect. In order +to debug Spack's URL parsing support, the ``spack url`` command can be used. + +""""""""""""""""""" +``spack url parse`` +""""""""""""""""""" + +If you need to debug a single URL, you can use the following command: + +.. command-output:: spack url parse http://cache.ruby-lang.org/pub/ruby/2.2/ruby-2.2.0.tar.gz + +You'll notice that the name and version of this URL are correctly detected, +and you can even see which regular expressions it was matched to. However, +you'll notice that when it substitutes the version number in, it doesn't +replace the ``2.2`` with ``9.9`` where we would expect ``9.9.9b`` to live. +This particular package may require a ``list_url`` or ``url_for_version`` +function. + +This command also accepts a ``--spider`` flag. If provided, Spack searches +for other versions of the package and prints the matching URLs. + +"""""""""""""""""" +``spack url list`` +"""""""""""""""""" + +This command lists every URL in every package in Spack. If given the +``--color`` and ``--extrapolation`` flags, it also colors the part of +the string that it detected to be the name and version. The +``--incorrect-name`` and ``--incorrect-version`` flags can be used to +print URLs that were not being parsed correctly. + +"""""""""""""""""" +``spack url test`` +"""""""""""""""""" + +This command attempts to parse every URL for every package in Spack +and prints a summary of how many of them are being correctly parsed. +It also prints a histogram showing which regular expressions are being +matched and how frequently: + +.. command-output:: spack url test + +This command is essential for anyone adding or changing the regular +expressions that parse names and versions. By running this command +before and after the change, you can make sure that your regular +expression fixes more packages than it breaks. + --------- Profiling --------- diff --git a/lib/spack/docs/packaging_guide.rst b/lib/spack/docs/packaging_guide.rst index 75546d943e..41d4289636 100644 --- a/lib/spack/docs/packaging_guide.rst +++ b/lib/spack/docs/packaging_guide.rst @@ -712,8 +712,8 @@ is at ``http://example.com/downloads/foo-1.0.tar.gz``, Spack will look in ``http://example.com/downloads/`` for links to additional versions. If you need to search another path for download links, you can supply some extra attributes that control how your package finds new -versions. See the documentation on `attribute_list_url`_ and -`attribute_list_depth`_. +versions. See the documentation on :ref:`attribute_list_url` and +:ref:`attribute_list_depth`. .. note:: @@ -728,6 +728,102 @@ versions. See the documentation on `attribute_list_url`_ and syntax errors, or the ``import`` will fail. Use this once you've got your package in working order. +-------------------- +Finding new versions +-------------------- + +You've already seen the ``homepage`` and ``url`` package attributes: + +.. code-block:: python + :linenos: + + from spack import * + + + class Mpich(Package): + """MPICH is a high performance and widely portable implementation of + the Message Passing Interface (MPI) standard.""" + homepage = "http://www.mpich.org" + url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz" + +These are class-level attributes used by Spack to show users +information about the package, and to determine where to download its +source code. + +Spack uses the tarball URL to extrapolate where to find other tarballs +of the same package (e.g. in :ref:`cmd-spack-checksum`, but +this does not always work. This section covers ways you can tell +Spack to find tarballs elsewhere. + +.. _attribute_list_url: + +^^^^^^^^^^^^ +``list_url`` +^^^^^^^^^^^^ + +When spack tries to find available versions of packages (e.g. with +:ref:`cmd-spack-checksum`), it spiders the parent directory +of the tarball in the ``url`` attribute. For example, for libelf, the +url is: + +.. code-block:: python + + url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz" + +Here, Spack spiders ``http://www.mr511.de/software/`` to find similar +tarball links and ultimately to make a list of available versions of +``libelf``. + +For many packages, the tarball's parent directory may be unlistable, +or it may not contain any links to source code archives. In fact, +many times additional package downloads aren't even available in the +same directory as the download URL. + +For these, you can specify a separate ``list_url`` indicating the page +to search for tarballs. For example, ``libdwarf`` has the homepage as +the ``list_url``, because that is where links to old versions are: + +.. code-block:: python + :linenos: + + class Libdwarf(Package): + homepage = "http://www.prevanders.net/dwarf.html" + url = "http://www.prevanders.net/libdwarf-20130729.tar.gz" + list_url = homepage + +.. _attribute_list_depth: + +^^^^^^^^^^^^^^ +``list_depth`` +^^^^^^^^^^^^^^ + +``libdwarf`` and many other packages have a listing of available +versions on a single webpage, but not all do. For example, ``mpich`` +has a tarball URL that looks like this: + +.. code-block:: python + + url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz" + +But its downloads are in many different subdirectories of +``http://www.mpich.org/static/downloads/``. So, we need to add a +``list_url`` *and* a ``list_depth`` attribute: + +.. code-block:: python + :linenos: + + class Mpich(Package): + homepage = "http://www.mpich.org" + url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz" + list_url = "http://www.mpich.org/static/downloads/" + list_depth = 2 + +By default, Spack only looks at the top-level page available at +``list_url``. ``list_depth`` tells it to follow up to 2 levels of +links from the top-level page. Note that here, this implies two +levels of subdirectories, as the ``mpich`` website is structured much +like a filesystem. But ``list_depth`` really refers to link depth +when spidering the page. .. _vcs-fetch: @@ -1241,103 +1337,6 @@ RPATHs in Spack are handled in one of three ways: links. You can see this how this is used in the :ref:`PySide example <pyside-patch>` above. --------------------- -Finding new versions --------------------- - -You've already seen the ``homepage`` and ``url`` package attributes: - -.. code-block:: python - :linenos: - - from spack import * - - - class Mpich(Package): - """MPICH is a high performance and widely portable implementation of - the Message Passing Interface (MPI) standard.""" - homepage = "http://www.mpich.org" - url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz" - -These are class-level attributes used by Spack to show users -information about the package, and to determine where to download its -source code. - -Spack uses the tarball URL to extrapolate where to find other tarballs -of the same package (e.g. in :ref:`cmd-spack-checksum`, but -this does not always work. This section covers ways you can tell -Spack to find tarballs elsewhere. - -.. _attribute_list_url: - -^^^^^^^^^^^^ -``list_url`` -^^^^^^^^^^^^ - -When spack tries to find available versions of packages (e.g. with -:ref:`cmd-spack-checksum`), it spiders the parent directory -of the tarball in the ``url`` attribute. For example, for libelf, the -url is: - -.. code-block:: python - - url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz" - -Here, Spack spiders ``http://www.mr511.de/software/`` to find similar -tarball links and ultimately to make a list of available versions of -``libelf``. - -For many packages, the tarball's parent directory may be unlistable, -or it may not contain any links to source code archives. In fact, -many times additional package downloads aren't even available in the -same directory as the download URL. - -For these, you can specify a separate ``list_url`` indicating the page -to search for tarballs. For example, ``libdwarf`` has the homepage as -the ``list_url``, because that is where links to old versions are: - -.. code-block:: python - :linenos: - - class Libdwarf(Package): - homepage = "http://www.prevanders.net/dwarf.html" - url = "http://www.prevanders.net/libdwarf-20130729.tar.gz" - list_url = homepage - -.. _attribute_list_depth: - -^^^^^^^^^^^^^^ -``list_depth`` -^^^^^^^^^^^^^^ - -``libdwarf`` and many other packages have a listing of available -versions on a single webpage, but not all do. For example, ``mpich`` -has a tarball URL that looks like this: - -.. code-block:: python - - url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz" - -But its downloads are in many different subdirectories of -``http://www.mpich.org/static/downloads/``. So, we need to add a -``list_url`` *and* a ``list_depth`` attribute: - -.. code-block:: python - :linenos: - - class Mpich(Package): - homepage = "http://www.mpich.org" - url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz" - list_url = "http://www.mpich.org/static/downloads/" - list_depth = 2 - -By default, Spack only looks at the top-level page available at -``list_url``. ``list_depth`` tells it to follow up to 2 levels of -links from the top-level page. Note that here, this implies two -levels of subdirectories, as the ``mpich`` website is structured much -like a filesystem. But ``list_depth`` really refers to link depth -when spidering the page. - .. _attribute_parallel: --------------- diff --git a/lib/spack/spack/cmd/url.py b/lib/spack/spack/cmd/url.py new file mode 100644 index 0000000000..6823f0febd --- /dev/null +++ b/lib/spack/spack/cmd/url.py @@ -0,0 +1,319 @@ +############################################################################## +# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC. +# Produced at the Lawrence Livermore National Laboratory. +# +# This file is part of Spack. +# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved. +# LLNL-CODE-647188 +# +# For details, see https://github.com/llnl/spack +# Please also see the LICENSE file for our notice and the LGPL. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License (as +# published by the Free Software Foundation) version 2.1, February 1999. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and +# conditions of the GNU Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +############################################################################## +from __future__ import division, print_function + +from collections import defaultdict + +import spack + +from llnl.util import tty +from spack.url import * +from spack.util.web import find_versions_of_archive + +description = "debugging tool for url parsing" + + +def setup_parser(subparser): + sp = subparser.add_subparsers(metavar='SUBCOMMAND', dest='subcommand') + + # Parse + parse_parser = sp.add_parser('parse', help='attempt to parse a url') + + parse_parser.add_argument( + 'url', + help='url to parse') + parse_parser.add_argument( + '-s', '--spider', action='store_true', + help='spider the source page for versions') + + # List + list_parser = sp.add_parser('list', help='list urls in all packages') + + list_parser.add_argument( + '-c', '--color', action='store_true', + help='color the parsed version and name in the urls shown ' + '(versions will be cyan, name red)') + list_parser.add_argument( + '-e', '--extrapolation', action='store_true', + help='color the versions used for extrapolation as well ' + '(additional versions will be green, names magenta)') + + excl_args = list_parser.add_mutually_exclusive_group() + + excl_args.add_argument( + '-n', '--incorrect-name', action='store_true', + help='only list urls for which the name was incorrectly parsed') + excl_args.add_argument( + '-v', '--incorrect-version', action='store_true', + help='only list urls for which the version was incorrectly parsed') + + # Test + sp.add_parser( + 'test', help='print a summary of how well we are parsing package urls') + + +def url(parser, args): + action = { + 'parse': url_parse, + 'list': url_list, + 'test': url_test + } + + action[args.subcommand](args) + + +def url_parse(args): + url = args.url + + tty.msg('Parsing URL: {0}'.format(url)) + print() + + ver, vs, vl, vi, vregex = parse_version_offset(url) + tty.msg('Matched version regex {0:>2}: r{1!r}'.format(vi, vregex)) + + name, ns, nl, ni, nregex = parse_name_offset(url, ver) + tty.msg('Matched name regex {0:>2}: r{1!r}'.format(ni, nregex)) + + print() + tty.msg('Detected:') + try: + print_name_and_version(url) + except UrlParseError as e: + tty.error(str(e)) + + print(' name: {0}'.format(name)) + print(' version: {0}'.format(ver)) + print() + + tty.msg('Substituting version 9.9.9b:') + newurl = substitute_version(url, '9.9.9b') + print_name_and_version(newurl) + + if args.spider: + print() + tty.msg('Spidering for versions:') + versions = find_versions_of_archive(url) + + max_len = max(len(str(v)) for v in versions) + + for v in sorted(versions): + print('{0:{1}} {2}'.format(v, max_len, versions[v])) + + +def url_list(args): + urls = set() + + # Gather set of URLs from all packages + for pkg in spack.repo.all_packages(): + url = getattr(pkg.__class__, 'url', None) + urls = url_list_parsing(args, urls, url, pkg) + + for params in pkg.versions.values(): + url = params.get('url', None) + urls = url_list_parsing(args, urls, url, pkg) + + # Print URLs + for url in sorted(urls): + if args.color or args.extrapolation: + print(color_url(url, subs=args.extrapolation, errors=True)) + else: + print(url) + + # Return the number of URLs that were printed, only for testing purposes + return len(urls) + + +def url_test(args): + # Collect statistics on how many URLs were correctly parsed + total_urls = 0 + correct_names = 0 + correct_versions = 0 + + # Collect statistics on which regexes were matched and how often + name_regex_dict = dict() + name_count_dict = defaultdict(int) + version_regex_dict = dict() + version_count_dict = defaultdict(int) + + tty.msg('Generating a summary of URL parsing in Spack...') + + # Loop through all packages + for pkg in spack.repo.all_packages(): + urls = set() + + url = getattr(pkg.__class__, 'url', None) + if url: + urls.add(url) + + for params in pkg.versions.values(): + url = params.get('url', None) + if url: + urls.add(url) + + # Calculate statistics + for url in urls: + total_urls += 1 + + # Parse versions + version = None + try: + version, vs, vl, vi, vregex = parse_version_offset(url) + version_regex_dict[vi] = vregex + version_count_dict[vi] += 1 + if version_parsed_correctly(pkg, version): + correct_versions += 1 + except UndetectableVersionError: + pass + + # Parse names + try: + name, ns, nl, ni, nregex = parse_name_offset(url, version) + name_regex_dict[ni] = nregex + name_count_dict[ni] += 1 + if name_parsed_correctly(pkg, name): + correct_names += 1 + except UndetectableNameError: + pass + + print() + print(' Total URLs found: {0}'.format(total_urls)) + print(' Names correctly parsed: {0:>4}/{1:>4} ({2:>6.2%})'.format( + correct_names, total_urls, correct_names / total_urls)) + print(' Versions correctly parsed: {0:>4}/{1:>4} ({2:>6.2%})'.format( + correct_versions, total_urls, correct_versions / total_urls)) + print() + + tty.msg('Statistics on name regular expresions:') + + print() + print(' Index Count Regular Expresion') + for ni in name_regex_dict: + print(' {0:>3}: {1:>6} r{2!r}'.format( + ni, name_count_dict[ni], name_regex_dict[ni])) + print() + + tty.msg('Statistics on version regular expresions:') + + print() + print(' Index Count Regular Expresion') + for vi in version_regex_dict: + print(' {0:>3}: {1:>6} r{2!r}'.format( + vi, version_count_dict[vi], version_regex_dict[vi])) + print() + + # Return statistics, only for testing purposes + return (total_urls, correct_names, correct_versions, + name_count_dict, version_count_dict) + + +def print_name_and_version(url): + """Prints a URL. Underlines the detected name with dashes and + the detected version with tildes. + + :param str url: The url to parse + """ + name, ns, nl, ntup, ver, vs, vl, vtup = substitution_offsets(url) + underlines = [' '] * max(ns + nl, vs + vl) + for i in range(ns, ns + nl): + underlines[i] = '-' + for i in range(vs, vs + vl): + underlines[i] = '~' + + print(' {0}'.format(url)) + print(' {0}'.format(''.join(underlines))) + + +def url_list_parsing(args, urls, url, pkg): + """Helper function for :func:`url_list`. + + :param argparse.Namespace args: The arguments given to ``spack url list`` + :param set urls: List of URLs that have already been added + :param url: A URL to potentially add to ``urls`` depending on ``args`` + :type url: str or None + :param spack.package.PackageBase pkg: The Spack package + :returns: The updated ``urls`` list + :rtype: set + """ + if url: + if args.incorrect_name: + # Only add URLs whose name was incorrectly parsed + try: + name = parse_name(url) + if not name_parsed_correctly(pkg, name): + urls.add(url) + except UndetectableNameError: + urls.add(url) + elif args.incorrect_version: + # Only add URLs whose version was incorrectly parsed + try: + version = parse_version(url) + if not version_parsed_correctly(pkg, version): + urls.add(url) + except UndetectableVersionError: + urls.add(url) + else: + urls.add(url) + + return urls + + +def name_parsed_correctly(pkg, name): + """Determine if the name of a package was correctly parsed. + + :param spack.package.PackageBase pkg: The Spack package + :param str name: The name that was extracted from the URL + :returns: True if the name was correctly parsed, else False + :rtype: bool + """ + pkg_name = pkg.name + + # After determining a name, `spack create` determines a build system. + # Some build systems prepend a special string to the front of the name. + # Since this can't be guessed from the URL, it would be unfair to say + # that these names are incorrectly parsed, so we remove them. + if pkg_name.startswith('r-'): + pkg_name = pkg_name[2:] + elif pkg_name.startswith('py-'): + pkg_name = pkg_name[3:] + elif pkg_name.startswith('octave-'): + pkg_name = pkg_name[7:] + + return name == pkg_name + + +def version_parsed_correctly(pkg, version): + """Determine if the version of a package was correctly parsed. + + :param spack.package.PackageBase pkg: The Spack package + :param str version: The version that was extracted from the URL + :returns: True if the name was correctly parsed, else False + :rtype: bool + """ + # If the version parsed from the URL is listed in a version() + # directive, we assume it was correctly parsed + for pkg_version in pkg.versions: + if str(pkg_version) == str(version): + return True + return False diff --git a/lib/spack/spack/cmd/url_parse.py b/lib/spack/spack/cmd/url_parse.py deleted file mode 100644 index b33d96299f..0000000000 --- a/lib/spack/spack/cmd/url_parse.py +++ /dev/null @@ -1,79 +0,0 @@ -############################################################################## -# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC. -# Produced at the Lawrence Livermore National Laboratory. -# -# This file is part of Spack. -# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved. -# LLNL-CODE-647188 -# -# For details, see https://github.com/llnl/spack -# Please also see the LICENSE file for our notice and the LGPL. -# -# This program is free software; you can redistribute it and/or modify -# it under the terms of the GNU Lesser General Public License (as -# published by the Free Software Foundation) version 2.1, February 1999. -# -# This program is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and -# conditions of the GNU Lesser General Public License for more details. -# -# You should have received a copy of the GNU Lesser General Public -# License along with this program; if not, write to the Free Software -# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA -############################################################################## -import llnl.util.tty as tty - -import spack -import spack.url -from spack.util.web import find_versions_of_archive - -description = "show parsing of a URL, optionally spider web for versions" - - -def setup_parser(subparser): - subparser.add_argument('url', help="url of a package archive") - subparser.add_argument( - '-s', '--spider', action='store_true', - help="spider the source page for versions") - - -def print_name_and_version(url): - name, ns, nl, ntup, ver, vs, vl, vtup = spack.url.substitution_offsets(url) - underlines = [" "] * max(ns + nl, vs + vl) - for i in range(ns, ns + nl): - underlines[i] = '-' - for i in range(vs, vs + vl): - underlines[i] = '~' - - print " %s" % url - print " %s" % ''.join(underlines) - - -def url_parse(parser, args): - url = args.url - - ver, vs, vl = spack.url.parse_version_offset(url, debug=True) - name, ns, nl = spack.url.parse_name_offset(url, ver, debug=True) - print - - tty.msg("Detected:") - try: - print_name_and_version(url) - except spack.url.UrlParseError as e: - tty.error(str(e)) - - print ' name: %s' % name - print ' version: %s' % ver - - print - tty.msg("Substituting version 9.9.9b:") - newurl = spack.url.substitute_version(url, '9.9.9b') - print_name_and_version(newurl) - - if args.spider: - print - tty.msg("Spidering for versions:") - versions = find_versions_of_archive(url) - for v in sorted(versions): - print "%-20s%s" % (v, versions[v]) diff --git a/lib/spack/spack/cmd/urls.py b/lib/spack/spack/cmd/urls.py deleted file mode 100644 index 4ff23e69c1..0000000000 --- a/lib/spack/spack/cmd/urls.py +++ /dev/null @@ -1,59 +0,0 @@ -############################################################################## -# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC. -# Produced at the Lawrence Livermore National Laboratory. -# -# This file is part of Spack. -# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved. -# LLNL-CODE-647188 -# -# For details, see https://github.com/llnl/spack -# Please also see the LICENSE file for our notice and the LGPL. -# -# This program is free software; you can redistribute it and/or modify -# it under the terms of the GNU Lesser General Public License (as -# published by the Free Software Foundation) version 2.1, February 1999. -# -# This program is distributed in the hope that it will be useful, but -# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and -# conditions of the GNU Lesser General Public License for more details. -# -# You should have received a copy of the GNU Lesser General Public -# License along with this program; if not, write to the Free Software -# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA -############################################################################## -import spack -import spack.url - -description = "inspect urls used by packages in spack" - - -def setup_parser(subparser): - subparser.add_argument( - '-c', '--color', action='store_true', - help="color the parsed version and name in the urls shown. " - "version will be cyan, name red") - subparser.add_argument( - '-e', '--extrapolation', action='store_true', - help="color the versions used for extrapolation as well. " - "additional versions are green, names magenta") - - -def urls(parser, args): - urls = set() - for pkg in spack.repo.all_packages(): - url = getattr(pkg.__class__, 'url', None) - if url: - urls.add(url) - - for params in pkg.versions.values(): - url = params.get('url', None) - if url: - urls.add(url) - - for url in sorted(urls): - if args.color or args.extrapolation: - print spack.url.color_url( - url, subs=args.extrapolation, errors=True) - else: - print url diff --git a/lib/spack/spack/test/cmd/url.py b/lib/spack/spack/test/cmd/url.py new file mode 100644 index 0000000000..4c60d814ce --- /dev/null +++ b/lib/spack/spack/test/cmd/url.py @@ -0,0 +1,116 @@ +############################################################################## +# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC. +# Produced at the Lawrence Livermore National Laboratory. +# +# This file is part of Spack. +# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved. +# LLNL-CODE-647188 +# +# For details, see https://github.com/llnl/spack +# Please also see the LICENSE file for our notice and the LGPL. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License (as +# published by the Free Software Foundation) version 2.1, February 1999. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and +# conditions of the GNU Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +############################################################################## +import argparse +import pytest + +from spack.cmd.url import * + + +@pytest.fixture(scope='module') +def parser(): + """Returns the parser for the ``url`` command""" + parser = argparse.ArgumentParser() + setup_parser(parser) + return parser + + +class MyPackage: + def __init__(self, name, versions): + self.name = name + self.versions = versions + + +def test_name_parsed_correctly(): + # Expected True + assert name_parsed_correctly(MyPackage('netcdf', []), 'netcdf') + assert name_parsed_correctly(MyPackage('r-devtools', []), 'devtools') + assert name_parsed_correctly(MyPackage('py-numpy', []), 'numpy') + assert name_parsed_correctly(MyPackage('octave-splines', []), 'splines') + + # Expected False + assert not name_parsed_correctly(MyPackage('', []), 'hdf5') + assert not name_parsed_correctly(MyPackage('hdf5', []), '') + assert not name_parsed_correctly(MyPackage('imagemagick', []), 'ImageMagick') # noqa + assert not name_parsed_correctly(MyPackage('yaml-cpp', []), 'yamlcpp') + assert not name_parsed_correctly(MyPackage('yamlcpp', []), 'yaml-cpp') + assert not name_parsed_correctly(MyPackage('r-py-parser', []), 'parser') + assert not name_parsed_correctly(MyPackage('oce', []), 'oce-0.18.0') # noqa + + +def test_version_parsed_correctly(): + # Expected True + assert version_parsed_correctly(MyPackage('', ['1.2.3']), '1.2.3') + assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4a') + assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4b') + + # Expected False + assert not version_parsed_correctly(MyPackage('', []), '1.2.3') + assert not version_parsed_correctly(MyPackage('', ['1.2.3']), '') + assert not version_parsed_correctly(MyPackage('', ['1.2.3']), '1.2.4') + assert not version_parsed_correctly(MyPackage('', ['3.4a']), '3.4') + assert not version_parsed_correctly(MyPackage('', ['3.4']), '3.4b') + assert not version_parsed_correctly(MyPackage('', ['0.18.0']), 'oce-0.18.0') # noqa + + +def test_url_parse(parser): + args = parser.parse_args(['parse', 'http://zlib.net/fossils/zlib-1.2.10.tar.gz']) + url(parser, args) + + +@pytest.mark.xfail +def test_url_parse_xfail(parser): + # No version in URL + args = parser.parse_args(['parse', 'http://www.netlib.org/voronoi/triangle.zip']) + url(parser, args) + + +def test_url_list(parser): + args = parser.parse_args(['list']) + total_urls = url_list(args) + + # The following two options should not change the number of URLs printed. + args = parser.parse_args(['list', '--color', '--extrapolation']) + colored_urls = url_list(args) + assert colored_urls == total_urls + + # The following two options should print fewer URLs than the default. + # If they print the same number of URLs, something is horribly broken. + # If they say we missed 0 URLs, something is probably broken too. + args = parser.parse_args(['list', '--incorrect-name']) + incorrect_name_urls = url_list(args) + assert 0 < incorrect_name_urls < total_urls + + args = parser.parse_args(['list', '--incorrect-version']) + incorrect_version_urls = url_list(args) + assert 0 < incorrect_version_urls < total_urls + + +def test_url_test(parser): + args = parser.parse_args(['test']) + (total_urls, correct_names, correct_versions, + name_count_dict, version_count_dict) = url_test(args) + + assert 0 < correct_names <= sum(name_count_dict.values()) <= total_urls # noqa + assert 0 < correct_versions <= sum(version_count_dict.values()) <= total_urls # noqa diff --git a/lib/spack/spack/url.py b/lib/spack/spack/url.py index 93c443fde8..65f8e12e58 100644 --- a/lib/spack/spack/url.py +++ b/lib/spack/spack/url.py @@ -28,17 +28,17 @@ The idea is to allow package creators to supply nothing more than the download location of the package, and figure out version and name information from there. -Example: when spack is given the following URL: +**Example:** when spack is given the following URL: - ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p243.tar.gz + https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz -It can figure out that the package name is ruby, and that it is at version -1.9.1-p243. This is useful for making the creation of packages simple: a user +It can figure out that the package name is ``hdf``, and that it is at version +``4.2.12``. This is useful for making the creation of packages simple: a user just supplies a URL and skeleton code is generated automatically. -Spack can also figure out that it can most likely download 1.8.1 at this URL: +Spack can also figure out that it can most likely download 4.2.6 at this URL: - ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.8.1.tar.gz + https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.6/src/hdf-4.2.6.tar.gz This is useful if a user asks for a package at a particular version number; spack doesn't need anyone to tell it where to get the tarball even though @@ -104,24 +104,23 @@ def strip_query_and_fragment(path): def split_url_extension(path): """Some URLs have a query string, e.g.: - 1. https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7.tgz?raw=true - 2. http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin.tar.gz - 3. https://gitlab.kitware.com/vtk/vtk/repository/archive.tar.bz2?ref=v7.0.0 + 1. https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7.tgz?raw=true + 2. http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin.tar.gz + 3. https://gitlab.kitware.com/vtk/vtk/repository/archive.tar.bz2?ref=v7.0.0 - In (1), the query string needs to be stripped to get at the - extension, but in (2) & (3), the filename is IN a single final query - argument. + In (1), the query string needs to be stripped to get at the + extension, but in (2) & (3), the filename is IN a single final query + argument. - This strips the URL into three pieces: prefix, ext, and suffix. - The suffix contains anything that was stripped off the URL to - get at the file extension. In (1), it will be '?raw=true', but - in (2), it will be empty. In (3) the suffix is a parameter that follows - after the file extension, e.g.: + This strips the URL into three pieces: ``prefix``, ``ext``, and ``suffix``. + The suffix contains anything that was stripped off the URL to + get at the file extension. In (1), it will be ``'?raw=true'``, but + in (2), it will be empty. In (3) the suffix is a parameter that follows + after the file extension, e.g.: - 1. ('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true') - 2. ('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin', - '.tar.gz', None) - 3. ('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0') + 1. ``('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true')`` + 2. ``('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin', '.tar.gz', None)`` + 3. ``('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0')`` """ prefix, ext, suffix = path, '', '' @@ -149,7 +148,7 @@ def determine_url_file_extension(path): """This returns the type of archive a URL refers to. This is sometimes confusing because of URLs like: - (1) https://github.com/petdance/ack/tarball/1.93_02 + (1) https://github.com/petdance/ack/tarball/1.93_02 Where the URL doesn't actually contain the filename. We need to know what type it is so that we can appropriately name files @@ -166,19 +165,44 @@ def determine_url_file_extension(path): return ext -def parse_version_offset(path, debug=False): - """Try to extract a version string from a filename or URL. This is taken - largely from Homebrew's Version class.""" +def parse_version_offset(path): + """Try to extract a version string from a filename or URL. + + :param str path: The filename or URL for the package + + :return: A tuple containing: + version of the package, + first index of version, + length of version string, + the index of the matching regex + the matching regex + + :rtype: tuple + + :raises UndetectableVersionError: If the URL does not match any regexes + """ original_path = path + # path: The prefix of the URL, everything before the ext and suffix + # ext: The file extension + # suffix: Any kind of query string that begins with a '?' path, ext, suffix = split_url_extension(path) - # Allow matches against the basename, to avoid including parent - # dirs in version name Remember the offset of the stem in the path + # stem: Everything from path after the final '/' stem = os.path.basename(path) offset = len(path) - len(stem) - version_types = [ + # List of the following format: + # + # [ + # (regex, string), + # ... + # ] + # + # The first regex that matches string will be used to determine + # the version of the package. Thefore, hyperspecific regexes should + # come first while generic, catch-all regexes should come last. + version_regexes = [ # GitHub tarballs, e.g. v1.2.3 (r'github.com/.+/(?:zip|tar)ball/v?((\d+\.)+\d+)$', path), @@ -258,16 +282,13 @@ def parse_version_offset(path, debug=False): (r'\/(\d\.\d+)\/', path), # e.g. http://www.ijg.org/files/jpegsrc.v8d.tar.gz - (r'\.v(\d+[a-z]?)', stem)] + (r'\.v(\d+[a-z]?)', stem) + ] - for i, vtype in enumerate(version_types): - regex, match_string = vtype + for i, version_regex in enumerate(version_regexes): + regex, match_string = version_regex match = re.search(regex, match_string) if match and match.group(1) is not None: - if debug: - tty.msg("Parsing URL: %s" % path, - " Matched regex %d: r'%s'" % (i, regex)) - version = match.group(1) start = match.start(1) @@ -275,30 +296,74 @@ def parse_version_offset(path, debug=False): if match_string is stem: start += offset - return version, start, len(version) + return version, start, len(version), i, regex raise UndetectableVersionError(original_path) -def parse_version(path, debug=False): - """Given a URL or archive name, extract a version from it and return - a version object. +def parse_version(path): + """Try to extract a version string from a filename or URL. + + :param str path: The filename or URL for the package + + :return: The version of the package + :rtype: spack.version.Version + + :raises UndetectableVersionError: If the URL does not match any regexes """ - ver, start, l = parse_version_offset(path, debug=debug) - return Version(ver) + version, start, length, i, regex = parse_version_offset(path) + return Version(version) -def parse_name_offset(path, v=None, debug=False): - if v is None: - v = parse_version(path, debug=debug) +def parse_name_offset(path, v=None): + """Try to determine the name of a package from its filename or URL. + + :param str path: The filename or URL for the package + :param str v: The version of the package + + :return: A tuple containing: + name of the package, + first index of name, + length of name, + the index of the matching regex + the matching regex + + :rtype: tuple + + :raises UndetectableNameError: If the URL does not match any regexes + """ + original_path = path + # We really need to know the version of the package + # This helps us prevent collisions between the name and version + if v is None: + try: + v = parse_version(path) + except UndetectableVersionError: + # Not all URLs contain a version. We still want to be able + # to determine a name if possible. + v = '' + + # path: The prefix of the URL, everything before the ext and suffix + # ext: The file extension + # suffix: Any kind of query string that begins with a '?' path, ext, suffix = split_url_extension(path) - # Allow matching with either path or stem, as with the version. + # stem: Everything from path after the final '/' stem = os.path.basename(path) offset = len(path) - len(stem) - name_types = [ + # List of the following format: + # + # [ + # (regex, string), + # ... + # ] + # + # The first regex that matches string will be used to determine + # the name of the package. Thefore, hyperspecific regexes should + # come first while generic, catch-all regexes should come last. + name_regexes = [ (r'/sourceforge/([^/]+)/', path), (r'github.com/[^/]+/[^/]+/releases/download/%s/(.*)-%s$' % (v, v), path), @@ -316,10 +381,11 @@ def parse_name_offset(path, v=None, debug=False): (r'/([^/]+)%s' % v, path), (r'^([^/]+)[_.-]v?%s' % v, path), - (r'^([^/]+)%s' % v, path)] + (r'^([^/]+)%s' % v, path) + ] - for i, name_type in enumerate(name_types): - regex, match_string = name_type + for i, name_regex in enumerate(name_regexes): + regex, match_string = name_regex match = re.search(regex, match_string) if match: name = match.group(1) @@ -333,17 +399,38 @@ def parse_name_offset(path, v=None, debug=False): name = name.lower() name = re.sub('[_.]', '-', name) - return name, start, len(name) + return name, start, len(name), i, regex - raise UndetectableNameError(path) + raise UndetectableNameError(original_path) def parse_name(path, ver=None): - name, start, l = parse_name_offset(path, ver) + """Try to determine the name of a package from its filename or URL. + + :param str path: The filename or URL for the package + :param str ver: The version of the package + + :return: The name of the package + :rtype: str + + :raises UndetectableNameError: If the URL does not match any regexes + """ + name, start, length, i, regex = parse_name_offset(path, ver) return name def parse_name_and_version(path): + """Try to determine the name of a package and extract its version + from its filename or URL. + + :param str path: The filename or URL for the package + + :return: A tuple containing: + The name of the package + The version of the package + + :rtype: tuple + """ ver = parse_version(path) name = parse_name(path, ver) return (name, ver) @@ -371,12 +458,12 @@ def cumsum(elts, init=0, fn=lambda x: x): def substitution_offsets(path): """This returns offsets for substituting versions and names in the - provided path. It is a helper for substitute_version(). + provided path. It is a helper for :func:`substitute_version`. """ # Get name and version offsets try: - ver, vs, vl = parse_version_offset(path) - name, ns, nl = parse_name_offset(path, ver) + ver, vs, vl, vi, vregex = parse_version_offset(path) + name, ns, nl, ni, nregex = parse_name_offset(path, ver) except UndetectableNameError: return (None, -1, -1, (), ver, vs, vl, (vs,)) except UndetectableVersionError: @@ -444,21 +531,22 @@ def wildcard_version(path): def substitute_version(path, new_version): """Given a URL or archive name, find the version in the path and - substitute the new version for it. Replace all occurrences of - the version *if* they don't overlap with the package name. + substitute the new version for it. Replace all occurrences of + the version *if* they don't overlap with the package name. - Simple example:: - substitute_version('http://www.mr511.de/software/libelf-0.8.13.tar.gz', '2.9.3') - ->'http://www.mr511.de/software/libelf-2.9.3.tar.gz' + Simple example: - Complex examples:: - substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.0.tar.gz', 2.1) - -> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz' + .. code-block:: python - # In this string, the "2" in mvapich2 is NOT replaced. - substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.tar.gz', 2.1) - -> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz' + substitute_version('http://www.mr511.de/software/libelf-0.8.13.tar.gz', '2.9.3') + >>> 'http://www.mr511.de/software/libelf-2.9.3.tar.gz' + Complex example: + + .. code-block:: python + + substitute_version('https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz', '2.3') + >>> 'https://www.hdfgroup.org/ftp/HDF/releases/HDF2.3/src/hdf-2.3.tar.gz' """ (name, ns, nl, noffs, ver, vs, vl, voffs) = substitution_offsets(path) @@ -477,17 +565,16 @@ def substitute_version(path, new_version): def color_url(path, **kwargs): """Color the parts of the url according to Spack's parsing. - Colors are: - Cyan: The version found by parse_version_offset(). - Red: The name found by parse_name_offset(). - - Green: Instances of version string from substitute_version(). - Magenta: Instances of the name (protected from substitution). + Colors are: + | Cyan: The version found by :func:`parse_version_offset`. + | Red: The name found by :func:`parse_name_offset`. - Optional args: - errors=True Append parse errors at end of string. - subs=True Color substitutions as well as parsed name/version. + | Green: Instances of version string from :func:`substitute_version`. + | Magenta: Instances of the name (protected from substitution). + :param str path: The filename or URL for the package + :keyword bool errors: Append parse errors at end of string. + :keyword bool subs: Color substitutions as well as parsed name/version. """ errors = kwargs.get('errors', False) subs = kwargs.get('subs', False) |