Introduction
Nextflow supports fetching nearly everything it needs to run a pipeline over the web automatically: pipeline code, software requirements, reference genomes, and even remote data sources.
Nextflow can run your analysis on a system that has no internet connection. There are just a few extra steps required to get everything you need available locally.
You will need to fetch everything you need on a system that does have an internet connection (typically your personal computer). Then, transfer these to your offline system using the methods you have available.
You will need three things: a working version of Nextflow, the pipeline assets, and any required reference genomes.
Nextflow
First of all, you need to have Nextflow installed on your system.
Go to the Nextflow releases page on GitHub: https://github.com/nextflow-io/nextflow/releases.
Each release has a drop-down list with associated Assets.
One of these should have the suffix -all
, e.g. nextflow-19.10.0-all
.
Download this file and transfer it to your offline system.
Run it to install Nextflow (it is a very large bash file). See the Nextflow installation docs for more installation details and options.
Once installed, you can stop Nextflow from looking for updates online by adding the following environment variable in your ~/.bashrc
file:
export NXF_OFFLINE='TRUE'
export NXF_OFFLINE='TRUE'
Pipeline code
To run a pipeline offline, you need the pipeline code, the software dependencies, and the shared nf-core/configs configuration profiles. We have created a helper tool as part of the nf-core package to automate this for you.
On a computer with an internet connection, run nf-core download <pipeline>
to download the pipeline and config profiles.
Add the argument --container singularity
to also fetch the singularity container(s).
The pipeline and requirements will be downloaded, configured with their relative paths, and packaged into a .tar.gz
file by default.
This can then be transferred to your offline system and unpacked.
Inside, you will see directories called workflow
(the pipeline files), config
(a copy of nf-core/configs), and (if you used --container singularity
) a directory called singularity
.
The pipeline code is adjusted by the download tool to expect these relative paths, so as long as you keep them together it should work as is.
To run the pipeline, simply use nextflow run <download_directory>/workflow [pipeline flags]
.
Shared storage
If you are downloading directly to the offline storage (eg. a head node with internet access whilst compute nodes are offline), you can use the --singularity-cache-only
option for nf-core download
and set the $NXF_SINGULARITY_CACHEDIR
environment variable.
This downloads the singularity images to the $NXF_SINGULARITY_CACHEDIR
folder and does not copy them into the target downloaded pipeline folder.
This reduces total disk space usage and is faster.
For more information, see the documentation for nf-core download
.
Reference genomes
Some pipelines require reference genomes and have built-in integration with AWS-iGenomes. If you wish to use these references, you must download and transfer them to your offline cluster. Once transferred, follow the reference genomes documentation to configure the base path for the references.
Bytesize talk
Here is a bytesize talk explaining the necessary steps to run pipelines offline.