# LOFAR PILOT This is a small pipeline runner script that wraps Common Workflow Language ([CWL](https://www.commonwl.org/)) pipelines with [toil](https://toil.readthedocs.io). It is compatible with [LINC](https://git.astron.nl/RD/LINC) and the [VLBI](https://git.astron.nl/RD/VLBI-cwl/) pipelines. *This is a work in progress. Issues should be reported to [Matthijs van der Wild](mailto:matthijs.van-der-wild@durham.ac.uk).* ## Assumptions This script assumes the following: * All relevant input data is available either in either the `$HOME` directory or in a directory henceforth called `$BINDDIR`. Targets of any links in these directories should be accessible to the compute directories, as these will be mounted during relevant jobs. * This script will be used with the SLURM queuing system on COSMA5 with the following options: `-p cosma5 -A durham -t 72:00:00`. If these options are not appropriate or if this script is to be run on other SLURM-run clusters one must set `$TOIL_SLURM_ARGS` prior to running. * `$CWL_SINGULARITY_CACHE` is set and the corresponding path contains (a link to) a singularity container `vlbi-cwl.sif`. If it isn't set a suitable container can be specified as detailed below. ## Installation The script can be installed to `/usr/bin` via ``` make install ``` while uninstalling the script is as straightforward as running ``` make uninstall ``` A custom installation location can be specified by setting `DESTDIR`. ## Usage Once installed the script can be run as follows: ``` pilot [options] $BINDDIR ``` Options can be the following: * `-h` prints the script usage with all available options (optional). * `-r` restarts a failed pipeline, if this script was run before but the pipeline failed. * `-c` allows the pipeline to use the specified container (optional, VLBI pipeline only). * `-i` points to your input JSON file (so it can be any appropriate JSON file, as long as it is located in either `$HOME` or `$BINDDIR`. * `-p` is a path to the pipeline repository (LINC and VLBI pipeline only). * `--scratch` is a path to local scratch storage where temporary data can be written to (optional). **`--scratch` must be local to the compute node. Nonlocal scratch storage will likely cause the pipeline to fail.** * `--outdir` is a path relative to which intermediate files and final data products will be written. Will be created if it does not exist. If not specified, `$BINDDIR` will be used instead. * `--batch_system` specifies the queuing system to be used. Defaults to `slurm`. Use `single_machine` to run on the local node. * `` is the workflow file name without extension, e.g. `delay-calibration` or `concatenate-flag` for the VLBI pipeline or `HBA_calibrator` or `HBA_target` for LINC. ## Environment variables `pilot.sh` recognises and accepts the following environment variables if set by the user: * [`APPTAINERENV_PREPEND_PATH`](https://apptainer.org/docs/user/main/environment_and_metadata.html#manipulating-path); defaults to the scripts directory of the pipeline set by `-p`. * [`APPTAINERENV_PYTHONPATH`](https://apptainer.org/docs/user/main/environment_and_metadata.html#apptainerenv-prefix); defaults to the scripts directory of the pipeline set by `-p`. * [`APPTAINER_BIND`](https://apptainer.org/docs/user/main/environment_and_metadata.html#environment-from-the-apptainer-runtime); defaults to `$HOME,$BINDDIR,$OUTPUT_DIR`, where `$OUTPUT_DIR` is set with `--outdir`. * `CWL_SINGULARITY_CACHE`; specifies the directory where toil can expect the software containers used by the pipeline. * [`TOIL_SLURM_ARGS`](https://toil.readthedocs.io/en/latest/python/toilAPIBatchsystem.html#batch-system-environment-variables); defaults to `-p cosma5 -A durham -t 72:00:00`. * [`TOIL_BATCH_LOGS_DIR`](https://toil.readthedocs.io/en/latest/appendices/environment_vars.html); defaults to `$OUTDIR/toil/logs`, where `$OUTDIR` is set with `--outdir`. ## Notes * Upon successful pipeline completion the results directory contains the following: * The pipeline data products, * the statistics gathered by toil. * Jobstore files and intermediate pipeline data products are stored in a `toil` directory in `$BINDDIR`. * Jobstore files can be removed by running `toil clean $BINDDIR/toil/_job`. * Toil may not clear temporary files after the pipeline has finished. These have to be removed by hand.