aboutsummaryrefslogtreecommitdiff

LOFAR PILOT

This is a small pipeline runner script that wraps Common Workflow Language (CWL) pipelines with toil. It is compatible with LINC and the VLBI pipelines. This is a work in progress. Issues should be reported to Matthijs van der Wild.

Assumptions

This script assumes the following:

  • All relevant input data is available either in either the $HOME directory or in a directory henceforth called $BINDDIR. Targets of any links in these directories should be accessible to the compute directories, as these will be mounted during relevant jobs.
  • This script will be used with the SLURM queuing system on COSMA5 with the following options: -p cosma5 -A durham -t 72:00:00. If these options are not appropriate or if this script is to be run on other SLURM-run clusters one must set $TOIL_SLURM_ARGS prior to running.
  • $CWL_SINGULARITY_CACHE is set and the corresponding path contains (a link to) a singularity container vlbi-cwl.sif. If it isn't set a suitable container can be specified as detailed below.

Installation

The script can be installed to /usr/bin via

make install

while uninstalling the script is as straightforward as running

make uninstall

A custom installation location can be specified by setting DESTDIR.

Usage

Once installed the script can be run as follows:

pilot [options] <workflow name> $BINDDIR

Options can be the following:

  • -h prints the script usage with all available options (optional).
  • -r restarts a failed pipeline, if this script was run before but the pipeline failed.
  • -c allows the pipeline to use the specified container (optional, VLBI pipeline only).
  • -i points to your input JSON file (so it can be any appropriate JSON file, as long as it is located in either $HOME or $BINDDIR.
  • -p is a path to the pipeline repository (LINC and VLBI pipeline only).
  • --scratch is a path to local scratch storage where temporary data can be written to (optional). --scratch must be local to the compute node. Nonlocal scratch storage will likely cause the pipeline to fail.
  • --outdir is a path relative to which intermediate files and final data products will be written. Will be created if it does not exist. If not specified, $BINDDIR will be used instead.
  • --batch_system specifies the queuing system to be used. Defaults to slurm. Use single_machine to run on the local node.
  • <workflow name> is the workflow file name without extension, e.g. delay-calibration or concatenate-flag for the VLBI pipeline or HBA_calibrator or HBA_target for LINC.

Environment variables

pilot.sh recognises and accepts the following environment variables if set by the user:

  • APPTAINERENV_PREPEND_PATH; defaults to the scripts directory of the pipeline set by -p.
  • APPTAINERENV_PYTHONPATH; defaults to the scripts directory of the pipeline set by -p.
  • APPTAINER_BIND; defaults to $HOME,$BINDDIR,$OUTPUT_DIR, where $OUTPUT_DIR is set with --outdir.
  • CWL_SINGULARITY_CACHE; specifies the directory where toil can expect the software containers used by the pipeline.
  • TOIL_SLURM_ARGS; defaults to -p cosma5 -A durham -t 72:00:00.
  • TOIL_BATCH_LOGS_DIR; defaults to $OUTDIR/toil/logs, where $OUTDIR is set with --outdir.

Notes

  • Upon successful pipeline completion the results directory contains the following:
    • The pipeline data products,
    • the statistics gathered by toil.
  • Jobstore files and intermediate pipeline data products are stored in a toil directory in $BINDDIR.
  • Jobstore files can be removed by running toil clean $BINDDIR/toil/<workflow>_job.
  • Toil may not clear temporary files after the pipeline has finished. These have to be removed by hand.