.. _rpackage: -------- RPackage -------- Like Python, R has its own built-in build system. The R build system is remarkably uniform and well-tested. This makes it one of the easiest build systems to create new Spack packages for. ^^^^^^ Phases ^^^^^^ The ``RPackage`` base class has a single phase: #. ``install`` - install the package By default, this phase runs the following command: .. code-block:: console $ R CMD INSTALL --library=/path/to/installation/prefix/rlib/R/library . ^^^^^^^^^^^^^^^^^^ Finding R packages ^^^^^^^^^^^^^^^^^^ The vast majority of R packages are hosted on CRAN - The Comprehensive R Archive Network. If you are looking for a particular R package, search for "CRAN " and you should quickly find what you want. If it isn't on CRAN, try Bioconductor, another common R repository. For the purposes of this tutorial, we will be walking through `r-caret `_ as an example. If you search for "CRAN caret", you will quickly find what you are looking for at https://cran.r-project.org/web/packages/caret/index.html. If you search for "Package source", you will find the download URL for the latest release. Use this URL with ``spack create`` to create a new package. ^^^^^^^^^^^^ Package name ^^^^^^^^^^^^ The first thing you'll notice is that Spack prepends ``r-`` to the front of the package name. This is how Spack separates R package extensions from the rest of the packages in Spack. Without this, we would end up with package name collisions more frequently than we would like. For instance, there are already packages for both: * ``ape`` and ``r-ape`` * ``curl`` and ``r-curl`` * ``gmp`` and ``r-gmp`` * ``jpeg`` and ``r-jpeg`` * ``openssl`` and ``r-openssl`` * ``uuid`` and ``r-uuid`` * ``xts`` and ``r-xts`` Many popular programs written in C/C++ are later ported to R as a separate project. ^^^^^^^^^^^ Description ^^^^^^^^^^^ The first thing you'll need to add to your new package is a description. The top of the homepage for ``caret`` lists the following description: caret: Classification and Regression Training Misc functions for training and plotting classification and regression models. You can either use the short description (first line), long description (second line), or both depending on what you feel is most appropriate. ^^^^^^^^ Homepage ^^^^^^^^ If you look at the bottom of the page, you'll see: Linking: Please use the canonical form https://CRAN.R-project.org/package=caret to link to this page. Please uphold the wishes of the CRAN admins and use https://CRAN.R-project.org/package=caret as the homepage instead of https://cran.r-project.org/web/packages/caret/index.html. The latter may change without notice. ^^^ URL ^^^ As previously mentioned, the download URL for the latest release can be found by searching "Package source" on the homepage. ^^^^^^^^ List URL ^^^^^^^^ CRAN maintains a single webpage containing the latest release of every single package: https://cran.r-project.org/src/contrib/ Of course, as soon as a new release comes out, the version you were using in your package is no longer available at that URL. It is moved to an archive directory. If you search for "Old sources", you will find: https://cran.r-project.org/src/contrib/Archive/caret If you only specify the URL for the latest release, your package will no longer be able to fetch that version as soon as a new release comes out. To get around this, add the archive directory as a ``list_url``. ^^^^^^^^^^^^^^^^^^^^^^^^^ Build system dependencies ^^^^^^^^^^^^^^^^^^^^^^^^^ As an extension of the R ecosystem, your package will obviously depend on R to build and run. Normally, we would use ``depends_on`` to express this, but for R packages, we use ``extends``. ``extends`` is similar to ``depends_on``, but adds an additional feature: the ability to "activate" the package by symlinking it to the R installation directory. Since every R package needs this, the ``RPackage`` base class contains: .. code-block:: python extends('r') depends_on('r', type=('build', 'run')) Take a close look at the homepage for ``caret``. If you look at the "Depends" section, you'll notice that ``caret`` depends on "R (≥ 2.10)". You should add this to your package like so: .. code-block:: python depends_on('r@2.10:', type=('build', 'run')) ^^^^^^^^^^^^^^ R dependencies ^^^^^^^^^^^^^^ R packages are often small and follow the classic Unix philosophy of doing one thing well. They are modular and usually depend on several other packages. You may find a single package with over a hundred dependencies. Luckily, CRAN packages are well-documented and list all of their dependencies in the following sections: * Depends * Imports * LinkingTo As far as Spack is concerned, all 3 of these dependency types correspond to ``type=('build', 'run')``, so you don't have to worry about them. If you are curious what they mean, https://github.com/spack/spack/issues/2951 has a pretty good summary: ``Depends`` is required and will cause those R packages to be *attached*, that is, their APIs are exposed to the user. ``Imports`` *loads* packages so that *the package* importing these packages can access their APIs, while *not* being exposed to the user. When a user calls ``library(foo)`` s/he *attaches* package ``foo`` and all of the packages under ``Depends``. Any function in one of these package can be called directly as ``bar()``. If there are conflicts, user can also specify ``pkgA::bar()`` and ``pkgB::bar()`` to distinguish between them. Historically, there was only ``Depends`` and ``Suggests``, hence the confusing names. Today, maybe ``Depends`` would have been named ``Attaches``. The ``LinkingTo`` is not perfect and there was recently an extensive discussion about API/ABI among other things on the R-devel mailing list among very skilled R developers: * https://stat.ethz.ch/pipermail/r-devel/2016-December/073505.html * https://stat.ethz.ch/pipermail/r-devel/2017-January/073647.html Some packages also have a fourth section: * Suggests These are optional, rarely-used dependencies that a user might find useful. You should **NOT** add these dependencies to your package. R packages already have enough dependencies as it is, and adding optional dependencies can really slow down the concretization process. They can also introduce circular dependencies. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Core, recommended, and non-core packages ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you look at "Depends", "Imports", and "LinkingTo", you will notice 3 different types of packages: """"""""""""" Core packages """"""""""""" If you look at the ``caret`` homepage, you'll notice a few dependencies that don't have a link to the package, like ``methods``, ``stats``, and ``utils``. These packages are part of the core R distribution and are tied to the R version installed. You can basically consider these to be "R itself". These are so essential to R so it would not make sense that they could be updated via CRAN. If so, you would basically get a different version of R. Thus, they're updated when R is updated. You can find a list of these core libraries at: https://github.com/wch/r-source/tree/trunk/src/library """""""""""""""""""" Recommended packages """""""""""""""""""" When you install R, there is an option called ``--with-recommended-packages``. This flag causes the R installation to include a few "Recommended" packages (legacy term). They are for historical reasons quite tied to the core R distribution, developed by the R core team or people closely related to it. The R core distribution "knows" about these package, but they are indeed distributed via CRAN. Because they're distributed via CRAN, they can also be updated between R version releases. Spack explicitly adds the ``--without-recommended-packages`` flag to prevent the installation of these packages. Due to the way Spack handles package activation (symlinking packages to the R installation directory), pre-existing recommended packages will cause conflicts for already-existing files. We could either not include these recommended packages in Spack and require them to be installed through ``--with-recommended-packages``, or we could not install them with R and let users choose the version of the package they want to install. We chose the latter. Since these packages are so commonly distributed with the R system, many developers may assume these packages exist and fail to list them as dependencies. Watch out for this. You can find a list of these recommended packages at: https://github.com/wch/r-source/blob/trunk/share/make/vars.mk """"""""""""""""" Non-core packages """"""""""""""""" These are packages that are neither "core" nor "recommended". There are more than 10,000 of these packages hosted on CRAN alone. For each of these package types, if you see that a specific version is required, for example, "lattice (≥ 0.20)", please add this information to the dependency: .. code-block:: python depends_on('r-lattice@0.20:', type=('build', 'run')) ^^^^^^^^^^^^^^^^^^ Non-R dependencies ^^^^^^^^^^^^^^^^^^ Some packages depend on non-R libraries for linking. Check out the `r-stringi `_ package for an example: https://CRAN.R-project.org/package=stringi. If you search for the text "SystemRequirements", you will see: ICU4C (>= 52, optional) This is how non-R dependencies are listed. Make sure to add these dependencies. The default dependency type should suffice. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Passing arguments to the installation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Some R packages provide additional flags that can be passed to ``R CMD INSTALL``, often to locate non-R dependencies. `r-rmpi `_ is an example of this, and flags for linking to an MPI library. To pass these to the installation command, you can override ``configure_args`` like so: .. code-block:: python def configure_args(self, spec, prefix): mpi_name = spec['mpi'].name # The type of MPI. Supported values are: # OPENMPI, LAM, MPICH, MPICH2, or CRAY if mpi_name == 'openmpi': Rmpi_type = 'OPENMPI' elif mpi_name == 'mpich': Rmpi_type = 'MPICH2' else: raise InstallError('Unsupported MPI type') return [ '--with-Rmpi-type={0}'.format(Rmpi_type), '--with-mpi={0}'.format(spec['mpi'].prefix), ] There is a similar ``configure_vars`` function that can be overridden to pass variables to the build. ^^^^^^^^^^^^^^^^^^^^^ Alternatives to Spack ^^^^^^^^^^^^^^^^^^^^^ CRAN hosts over 10,000 R packages, most of which are not in Spack. Many users may not need the advanced features of Spack, and may prefer to install R packages the normal way: .. code-block:: console $ R > install.packages("ggplot2") R will search CRAN for the ``ggplot2`` package and install all necessary dependencies for you. If you want to update all installed R packages to the latest release, you can use: .. code-block:: console > update.packages(ask = FALSE) This works great for users who have internet access, but those on an air-gapped cluster will find it easier to let Spack build a download mirror and install these packages for you. Where Spack really shines is its ability to install non-R dependencies and link to them properly, something the R installation mechanism cannot handle. ^^^^^^^^^^^^^^^^^^^^^^ External documentation ^^^^^^^^^^^^^^^^^^^^^^ For more information on installing R packages, see: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/INSTALL.html