diff options
| author | Matthijs van der Wild <matthijs.van-der-wild@durham.ac.uk> | 2024-09-30 16:19:51 +0100 |
|---|---|---|
| committer | Matthijs van der Wild <matthijs.van-der-wild@durham.ac.uk> | 2024-09-30 16:19:51 +0100 |
| commit | 9246d90121fb9beb87796ca5dc9b8758daaaeb45 (patch) | |
| tree | d8ac9bdcf3fc527150bd0b008e453be4da6b2a84 /README.md | |
Initialise repositories
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..ba7adb8 --- /dev/null +++ b/README.md @@ -0,0 +1,44 @@ +# LOFAR PILOT + +This is a small pipeline runner script that wraps Common Workflow Language ([CWL](https://www.commonwl.org/) pipelines with [toil](https://toil.readthedocs.io). +It is compatible with [LINC](https://git.astron.nl/RD/LINC) and the [VLBI](https://git.astron.nl/RD/VLBI-cwl/) pipelines. +*This is a work in progress. +Issues should be reported to [Matthijs van der Wild](mailto:matthijs.van-der-wild@durham.ac.uk).* + +## Assumptions + +This script assumes the following: +* All relevant input data is available either in either the `$HOME` directory or in a directory henceforth called `$BINDDIR`. + Targets of any links in these directories should be accessible to the compute directories, as these will be mounted during relevant jobs. +* The output will be written to a results directory in `$BINDDIR`. +* This script will be used with the SLURM queuing system on COSMA5 with the following options: `-p cosma5 -A durham -t 72:00:00`. + If these options are not appropriate or if this script is to be run on other SLURM-run clusters one must set `$TOIL_SLURM_ARGS` prior to running. +* `$CWL_SINGULARITY_CACHE` is set and the corresponding path contains (a link to) a singularity container `vlbi-cwl.sif`. + If it isn't set a suitable container can be specified as detailed below. + +## Execution + +The script can be run as follows: +``` +sh pilot.sh [options] <workflow name> $BINDDIR +``` +Options can be the following: +* `-h` prints the script usage with all available options (optional). +* `-r` restarts a failed pipeline, if this script was run before but the pipeline failed. +* `-c` allows the pipeline to use the specified container (optional, VLBI pipeline only). +* `-i` points to your input JSON file (so it can be any appropriate JSON file, as long as it is located in either `$HOME` or `$BINDDIR`. +* `-p` is a path to the pipeline repository (LINC and VLBI pipeline only). +* `--scratch` is a path to local scratch storage where temporary data can be written to (optional). + **`--scratch` must be local to the compute node. + Nonlocal scratch storage will likely cause the pipeline to fail.** +* `<workflow name>` is the workflow file name without extension, e.g. `delay-calibration` or `concatenate-flag` for the VLBI pipeline or `HBA_calibrator` or `HBA_target` for LINC. + +## Notes + +* Upon successful pipeline completion the results directory contains the following: + * The pipeline data products, + * the statistics gathered by toil. +* Jobstore files and intermediate pipeline data products are stored in a `toil` directory in `$BINDDIR`. +* Jobstore files can be removed by running `toil clean $BINDDIR/toil/<workflow>_job`. +* Toil may not clear temporary files after the pipeline has finished. + These have to be removed by hand. |