1 files changed, 437 insertions, 5 deletions
diff --git a/lib/spack/docs/developer_guide.rst b/lib/spack/docs/developer_guide.rst
index 9964bb6e51..4e349cff45 100644
--- a/lib/spack/docs/developer_guide.rst
+++ b/lib/spack/docs/developer_guide.rst
@@ -106,11 +106,21 @@ with a high level view of Spack's directory structure:
             external/      <- external libs included in Spack distro
             llnl/          <- some general-use libraries
 
-            spack/         <- spack module; contains Python code
-               cmd/        <- each file in here is a spack subcommand
-               compilers/  <- compiler description files
-               test/       <- unit test modules
-               util/       <- common code
+            spack/                <- spack module; contains Python code
+               analyzers/         <- modules to run analysis on installed packages
+               build_systems/     <- modules for different build systems 
+               cmd/               <- each file in here is a spack subcommand
+               compilers/         <- compiler description files          
+               container/         <- module for spack containerize
+               hooks/             <- hook modules to run at different points
+               modules/           <- modules for lmod, tcl, etc.
+               operating_systems/ <- operating system modules
+               platforms/         <- different spack platforms
+               reporters/         <- reporters like cdash, junit
+               schema/            <- schemas to validate data structures
+               solver/            <- the spack solver
+               test/              <- unit test modules
+               util/              <- common code
 
 Spack is designed so that it could live within a `standard UNIX
 directory hierarchy <http://linux.die.net/man/7/hier>`_, so ``lib``,
@@ -251,6 +261,22 @@ Unit tests
   This is a fake package hierarchy used to mock up packages for
   Spack's test suite.
 
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Research and Monitoring Modules
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+:mod:`spack.monitor`
+  Contains :class:`SpackMonitor <spack.monitor.SpackMonitor>`. This is accessed
+  from the ``spack install`` and ``spack analyze`` commands to send build
+  and package metadada up to a `Spack Monitor <https://github.com/spack/spack-monitor>`_ server. 
+
+
+:mod:`spack.analyzers`
+  A module folder with a :class:`AnalyzerBase <spack.analyzers.analyzer_base.AnalyzerBase>`
+  that provides base functions to run, save, and (optionally) upload analysis
+  results to a `Spack Monitor <https://github.com/spack/spack-monitor>`_ server.
+
+
 ^^^^^^^^^^^^^
 Other Modules
 ^^^^^^^^^^^^^
@@ -299,6 +325,235 @@ Conceptually, packages are overloaded.  They contain:
 Stage objects
 -------------
 
+
+.. _writing-analyzers:
+
+-----------------
+Writing analyzers
+-----------------
+
+To write an analyzer, you should add a new python file to the
+analyzers module directory at ``lib/spack/spack/analyzers`` .
+Your analyzer should be a subclass of the :class:`AnalyzerBase <spack.analyzers.analyzer_base.AnalyzerBase>`. For example, if you want
+to add an analyzer class ``Myanalyzer`` you woul write to 
+``spack/analyzers/myanalyzer.py`` and import and 
+use the base as follows:
+
+.. code-block:: python
+
+    from .analyzer_base import AnalyzerBase
+
+    class Myanalyzer(AnalyzerBase):
+
+
+Note that the class name is your module file name, all lowercase
+except for the first capital letter. You can  look at other analyzers in 
+that analyzer directory for examples. The guide here will tell you about the basic functions needed.
+
+^^^^^^^^^^^^^^^^^^^^^^^^^
+Analyzer Output Directory
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, when you run ``spack analyze run`` an analyzer output directory will
+be created in your spack user directory in your ``$HOME``. The reason we output here
+is because the install directory might not always be writable. 
+
+.. code-block:: console
+
+    ~/.spack/
+      analyzers
+      
+Result files will be written here, organized in subfolders in the same structure
+as the package, with each analyzer owning it's own subfolder. for example:
+
+
+.. code-block:: console
+
+    $ tree ~/.spack/analyzers/
+    /home/spackuser/.spack/analyzers/
+    └── linux-ubuntu20.04-skylake
+        └── gcc-9.3.0
+            └── zlib-1.2.11-sl7m27mzkbejtkrajigj3a3m37ygv4u2
+                ├── environment_variables
+                │   └── spack-analyzer-environment-variables.json
+                ├── install_files
+                │   └── spack-analyzer-install-files.json
+                └── libabigail
+                    └── lib
+                        └── spack-analyzer-libabigail-libz.so.1.2.11.xml 
+
+
+Notice that for the libabigail analyzer, since results are generated per object,
+we honor the object's folder in case there are equivalently named files in 
+different folders. The result files are typically written as json so they can be easily read and  uploaded in a future interaction with a monitor.
+
+
+^^^^^^^^^^^^^^^^^
+Analyzer Metadata
+^^^^^^^^^^^^^^^^^
+
+Your analyzer is required to have the class attributes ``name``, ``outfile``,
+and ``description``. These are printed to the user with they use the subcommand
+``spack analyze list-analyzers``.  Here is an example.
+As we mentioned above, note that this analyzer would live in a module named
+``libabigail.py`` in the analyzers folder so that the class can be discovered.
+
+
+.. code-block:: python
+
+    class Libabigail(AnalyzerBase):
+
+        name = "libabigail"
+        outfile = "spack-analyzer-libabigail.json"
+        description = "Application Binary Interface (ABI) features for objects"
+
+
+This means that the name and output file should be unique for your analyzer.
+Note that "all" cannot be the name of an analyzer, as this key is used to indicate
+that the user wants to run all analyzers.
+
+.. _analyzer_run_function:
+
+
+^^^^^^^^^^^^^^^^^^^^^^^^
+An analyzer run Function
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The core of an analyzer is its ``run()`` function, which should accept no
+arguments. You can assume your analyzer has the package spec of interest at ``self.spec``
+and it's up to the run function to generate whatever analysis data you need,
+and then return the object with a key as the analyzer name. The result data
+should be a list of objects, each with a name, ``analyzer_name``, ``install_file``,
+and one of ``value`` or ``binary_value``. The install file should be for a relative
+path, and not the absolute path. For example, let's say we extract a metric called
+``metric`` for ``bin/wget`` using our analyzer ``thebest-analyzer``. 
+We might have data that looks like this:
+
+.. code-block:: python
+
+    result = {"name": "metric", "analyzer_name": "thebest-analyzer", "value": "1", "install_file": "bin/wget"}
+
+
+We'd then return it as follows - note that they key is the analyzer name at ``self.name``.
+
+.. code-block:: python
+
+    return {self.name: result}
+
+This will save the complete result to the analyzer metadata folder, as described
+previously. If you want support for adding a different kind of metadata (e.g.,
+not associated with an install file) then the monitor server would need to be updated
+to support this first.
+
+
+^^^^^^^^^^^^^^^^^^^^^^^^^
+An analyzer init Function
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you don't need any extra dependencies or checks, you can skip defining an analyzer
+init function, as the base class will handle it. Typically, it will accept
+a spec, and an optional output directory (if the user does not want the default
+metadata folder for analyzer results). The analyzer init function should call
+it's parent init, and then do any extra checks or validation that are required to
+work. For example:
+
+.. code-block:: python
+
+    def __init__(self, spec, dirname=None):
+        super(Myanalyzer, self).__init__(spec, dirname)
+
+        # install extra dependencies, do extra preparation and checks here
+
+
+At the end of the init, you will have available to you:
+
+ - **self.spec**: the spec object
+ - **self.dirname**: an optional directory name the user as provided at init to save
+ - **self.output_dir**: the analyzer metadata directory, where we save by default
+ - **self.meta_dir**: the path to the package metadata directory (.spack) if you need it
+
+And can proceed to write your analyzer.
+
+
+^^^^^^^^^^^^^^^^^^^^^^^
+Saving Analyzer Results
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The analyzer will have ``save_result`` called, with the result object generated
+to save it to the filesystem, and if the user has added the ``--monitor`` flag
+to upload it to a monitor server. If your result follows an accepted result
+format and you don't need to parse it further, you don't need to add this 
+function to your class. However, if your result data is large or otherwise
+needs additional parsing, you can define it. If you define the function, it
+is useful to know about the ``output_dir`` property, which you can join
+with your output file relative path of choice:
+
+.. code-block:: python
+
+    outfile = os.path.join(self.output_dir, "my-output-file.txt")
+
+
+The directory will be provided by the ``output_dir`` property but it won't exist,
+so you should create it:
+
+
+.. code::block:: python
+
+    # Create the output directory
+    if not os.path.exists(self._output_dir):
+        os.makedirs(self._output_dir)
+
+
+If you are generating results that match to specific files in the package
+install directory, you should try to maintain those paths in the case that
+there are equivalently named files in different directories that would
+overwrite one another. As an example of an analyzer with a custom save,
+the Libabigail analyzer saves ``*.xml`` files to the analyzer metadata
+folder in ``run()``, as they are either binaries, or as xml (text) would
+usually be too big to pass in one request. For this reason, the files
+are saved during ``run()`` and the filenames added to the result object,
+and then when the result object is passed back into ``save_result()``,
+we skip saving to the filesystem, and instead read the file and send
+each one (separately) to the monitor:
+
+
+.. code-block:: python
+
+    def save_result(self, result, monitor=None, overwrite=False):
+        """ABI results are saved to individual files, so each one needs to be
+        read and uploaded. Result here should be the lookup generated in run(),
+        the key is the analyzer name, and each value is the result file.
+        We currently upload the entire xml as text because libabigail can't
+        easily read gzipped xml, but this will be updated when it can.
+        """
+        if not monitor:
+            return
+
+        name = self.spec.package.name
+
+        for obj, filename in result.get(self.name, {}).items():
+
+            # Don't include the prefix
+            rel_path = obj.replace(self.spec.prefix + os.path.sep, "")
+
+            # We've already saved the results to file during run
+            content = spack.monitor.read_file(filename)
+
+            # A result needs an analyzer, value or binary_value, and name
+            data = {"value": content, "install_file": rel_path, "name": "abidw-xml"}
+            tty.info("Sending result for %s %s to monitor." % (name, rel_path))
+            monitor.send_analyze_metadata(self.spec.package, {"libabigail": [data]})
+
+
+
+Notice that this function, if you define it, requires a result object (generated by
+``run()``, a monitor (if you want to send), and a boolean ``overwrite`` to be used
+to check if a result exists first, and not write to it if the result exists and 
+overwrite is False. Also notice that since we already saved these files to the analyzer metadata folder, we return early if a monitor isn't defined, because this function serves to send  results to the monitor. If you haven't saved anything to the analyzer metadata folder
+yet, you might want to do that here. You should also use ``tty.info`` to give
+the user a message of "Writing result to $DIRNAME."
+
+
 .. _writing-commands:
 
 ----------------
@@ -345,6 +600,183 @@ Whenever you add/remove/rename a command or flags for an existing command,
 make sure to update Spack's `Bash tab completion script
 <https://github.com/adamjstewart/spack/blob/develop/share/spack/spack-completion.bash>`_.
 
+
+-------------
+Writing Hooks
+-------------
+
+A hook is a callback that makes it easy to design functions that run
+for different events. We do this by way of defining hook types, and then
+inserting them at different places in the spack code base. Whenever a hook
+type triggers by way of a function call, we find all the hooks of that type,
+and run them.
+
+Spack defines hooks by way of a module at ``lib/spack/spack/hooks`` where we can define
+types of hooks in the ``__init__.py``, and then python files in that folder
+can use hook functions. The files are automatically parsed, so if you write
+a new file for some integration (e.g., ``lib/spack/spack/hooks/myintegration.py``
+you can then write hook functions in that file that will be automatically detected,
+and run whenever your hook is called. This section will cover the basic kind 
+of hooks, and how to write them.
+
+^^^^^^^^^^^^^^
+Types of Hooks
+^^^^^^^^^^^^^^
+
+The following hooks are currently implemented to make it easy for you,
+the developer, to add hooks at different stages of a spack install or similar. 
+If there is a hook that you would like and is missing, you can propose to add a new one.
+
+"""""""""""""""""""""
+``pre_install(spec)``
+"""""""""""""""""""""
+
+A ``pre_install`` hook is run within an install subprocess, directly before
+the install starts. It expects a single argument of a spec, and is run in 
+a multiprocessing subprocess. Note that if you see ``pre_install`` functions associated with packages these are not hooks
+as we have defined them here, but rather callback functions associated with 
+a package install.
+
+
+""""""""""""""""""""""
+``post_install(spec)``
+""""""""""""""""""""""
+
+A ``post_install`` hook is run within an install subprocess, directly after
+the install finishes, but before the build stage is removed. If you
+write one of these hooks, you should expect it to accept a spec as the only
+argument. This is run in a multiprocessing subprocess. This ``post_install`` is
+also seen in packages, but in this context not related to the hooks described
+here.
+
+
+""""""""""""""""""""""""""
+``on_install_start(spec)``
+""""""""""""""""""""""""""
+
+This hook is run at the beginning of ``lib/spack/spack/installer.py``,
+in the install function of a ``PackageInstaller``,
+and importantly is not part of a build process, but before it. This is when
+we have just newly grabbed the task, and are preparing to install. If you 
+write a hook of this type, you should provide the spec to it.
+
+.. code-block:: python
+
+    def on_install_start(spec):
+        """On start of an install, we want to...
+        """
+        print('on_install_start')
+    
+
+""""""""""""""""""""""""""""
+``on_install_success(spec)``
+""""""""""""""""""""""""""""
+
+This hook is run on a successful install, and is also run inside the build
+process, akin to ``post_install``. The main difference is that this hook
+is run outside of the context of the stage directory, meaning after the
+build stage has been removed and the user is alerted that the install was
+successful. If you need to write a hook that is run on success of a particular
+phase, you should use ``on_phase_success``.
+
+""""""""""""""""""""""""""""
+``on_install_failure(spec)``
+""""""""""""""""""""""""""""
+
+This hook is run given an install failure that happens outside of the build
+subprocess, but somewhere in ``installer.py`` when something else goes wrong.
+If you need to write a hook that is relevant to a failure within a build
+process, you would want to instead use ``on_phase_failure``.
+
+
+"""""""""""""""""""""""""""""""""""""""""""""""
+``on_phase_success(pkg, phase_name, log_file)``
+"""""""""""""""""""""""""""""""""""""""""""""""
+
+This hook is run within the install subprocess, and specifically when a phase
+successfully finishes. Since we are interested in the package, the name of
+the phase, and any output from it, we require:
+
+ - **pkg**: the package variable, which also has the attached spec at ``pkg.spec``
+ - **phase_name**: the name of the phase that was successful (e.g., configure)
+ - **log_file**: the path to the file with output, in case you need to inspect or otherwise interact with it.
+
+"""""""""""""""""""""""""""""""""""""""""""""
+``on_phase_error(pkg, phase_name, log_file)``
+"""""""""""""""""""""""""""""""""""""""""""""
+
+In the case of an error during a phase, we might want to trigger some event
+with a hook, and this is the purpose of this particular hook. Akin to
+``on_phase_success`` we require the same variables - the package that failed,
+the name of the phase, and the log file where we might find errors.
+
+"""""""""""""""""""""""""""""""""
+``on_analyzer_save(pkg, result)``
+"""""""""""""""""""""""""""""""""
+
+After an analyzer has saved some result for a package, this hook is called,
+and it provides the package that we just ran the analysis for, along with
+the loaded result. Typically, a result is structured to have the name
+of the analyzer as key, and the result object that is defined in detail in
+:ref:`analyzer_run_function`.
+
+.. code-block:: python
+
+    def on_analyzer_save(pkg, result):
+        """given a package and a result...
+        """
+        print('Do something extra with a package analysis result here')
+
+
+^^^^^^^^^^^^^^^^^^^^^^
+Adding a New Hook Type
+^^^^^^^^^^^^^^^^^^^^^^
+
+Adding a new hook type is very simple!  In ``lib/spack/spack/hooks/__init__.py``
+you can simply create a new ``HookRunner`` that is named to match your new hook.
+For example, let's say you want to add a new hook called ``post_log_write``
+to trigger after anything is written to a logger. You would add it as follows:
+
+.. code-block:: python
+
+    # pre/post install and run by the install subprocess
+    pre_install = HookRunner('pre_install')
+    post_install = HookRunner('post_install')
+
+    # hooks related to logging
+    post_log_write = HookRunner('post_log_write') # <- here is my new hook! 
+    
+
+You then need to decide what arguments my hook would expect. Since this is
+related to logging, let's say that you want a message and level. That means
+that when you add a python file to the ``lib/spack/spack/hooks``
+folder with one or more callbacks intended to be triggered by this hook. You might
+use my new hook as follows:
+
+.. code-block:: python
+
+    def post_log_write(message, level):
+        """Do something custom with the messsage and level every time we write
+        to the log
+        """
+        print('running post_log_write!')
+
+
+To use the hook, we would call it as follows somewhere in the logic to do logging.
+In this example, we use it outside of a logger that is already defined:
+
+.. code-block:: python
+
+    import spack.hooks
+
+    # We do something here to generate a logger and message
+    spack.hooks.post_log_write(message, logger.level)
+
+
+This is not to say that this would be the best way to implement an integration
+with the logger (you'd probably want to write a custom logger, or you could
+have the hook defined within the logger) but serves as an example of writing a hook. 
+
 ----------
 Unit tests
 ----------