LOFAR PILOT
This is a small pipeline runner script that wraps Common Workflow Language (CWL) pipelines with toil. It is compatible with LINC and the VLBI pipelines. This is a work in progress. Issues should be reported to Matthijs van der Wild.
Assumptions
This script assumes the following:
- All relevant input data is available either in either the
$HOMEdirectory or in a directory henceforth called$BINDDIR. Targets of any links in these directories should be accessible to the compute directories, as these will be mounted during relevant jobs. - This script will be used with the SLURM queuing system on COSMA5 with the following options:
-p cosma5 -A durham -t 72:00:00. If these options are not appropriate or if this script is to be run on other SLURM-run clusters one must set$TOIL_SLURM_ARGSprior to running. $CWL_SINGULARITY_CACHEis set and the corresponding path contains (a link to) a singularity containervlbi-cwl.sif. If it isn't set a suitable container can be specified as detailed below.
Installation
The script can be installed to /usr/bin via
make install
while uninstalling the script is as straightforward as running
make uninstall
A custom installation location can be specified by setting DESTDIR.
Usage
Once installed the script can be run as follows:
pilot [options] <workflow name> $BINDDIR
Options can be the following:
-hprints the script usage with all available options (optional).-rrestarts a failed pipeline, if this script was run before but the pipeline failed.-callows the pipeline to use the specified container (optional, VLBI pipeline only).-ipoints to your input JSON file (so it can be any appropriate JSON file, as long as it is located in either$HOMEor$BINDDIR.-pis a path to the pipeline repository (LINC and VLBI pipeline only).--scratchis a path to local scratch storage where temporary data can be written to (optional).--scratchmust be local to the compute node. Nonlocal scratch storage will likely cause the pipeline to fail.--outdiris a path relative to which intermediate files and final data products will be written. Will be created if it does not exist. If not specified,$BINDDIRwill be used instead.--batch_systemspecifies the queuing system to be used. Defaults toslurm. Usesingle_machineto run on the local node.<workflow name>is the workflow file name without extension, e.g.delay-calibrationorconcatenate-flagfor the VLBI pipeline orHBA_calibratororHBA_targetfor LINC.
Environment variables
pilot.sh recognises and accepts the following environment variables if set by the user:
APPTAINERENV_PREPEND_PATH; defaults to the scripts directory of the pipeline set by-p.APPTAINERENV_PYTHONPATH; defaults to the scripts directory of the pipeline set by-p.APPTAINER_BIND; defaults to$HOME,$BINDDIR,$OUTPUT_DIR, where$OUTPUT_DIRis set with--outdir.CWL_SINGULARITY_CACHE; specifies the directory where toil can expect the software containers used by the pipeline.TOIL_SLURM_ARGS; defaults to-p cosma5 -A durham -t 72:00:00.TOIL_BATCH_LOGS_DIR; defaults to$OUTDIR/toil/logs, where$OUTDIRis set with--outdir.
Notes
- Upon successful pipeline completion the results directory contains the following:
- The pipeline data products,
- the statistics gathered by toil.
- Jobstore files and intermediate pipeline data products are stored in a
toildirectory in$BINDDIR. - Jobstore files can be removed by running
toil clean $BINDDIR/toil/<workflow>_job. - Toil may not clear temporary files after the pipeline has finished. These have to be removed by hand.