NU_Genomics Configuration

All nf-core pipelines have been successfully configured for use on the Genomics Compute Cluster (GCC) on Quest at Northwestern University. Note that, at present, this config has only been tested with nf-core/RNA-seq, but should function similarly for other nf-core pipelines. If you would like to test other pipelines and share on the genomics-rcs Slack, we would be very much obliged.

To use, run the pipeline with -profile nu_genomics. This will download and launch the nu_genomics.config which has been pre-configured with a setup suitable for the GCC. Using this profile, a docker image containing all of the required software will be downloaded, and converted to a Singularity image before execution of the pipeline.

Before running the pipeline

For this pipeline to run successfully for you on Quest, you must be a member of the GCC. You can apply to join here. We strongly recommend GCC members also be members of an additional allocation for storage purposes. You can learn more about and find the links to apply for research allocations here As of 2023, we now have a dedicated nextflow module, which we recommend using. However if you need an edge version of nextflow, you will need to perform a local installation of Nextflow and add it to your path. Please follow the basic installation instructions shown here, or install in your home directory as shown below. If you already have a bin directory in your path, you will not need to create the directory or append to your path.

cd ~
mkdir bin
cd bin
curl -s https://get.nextflow.io | bash
export PATH=~/bin:$PATH

Note that you may need to install an “edge” version of Nextflow, depending on which pipeline you use. Please read the documentation carefully to see if this is the case, or you may see an error when running the pipeline. If this is the case, you need to explicitly set the version when installing, e.g.:

curl -s https://github.com/nextflow-io/nextflow/releases/download/v20.11.0-edge/nextflow-20.11.0-edge-all | bash

If you are using the nextflow module, you can simply load the module as follows:

module purge
module load nextflow/23.04.3 #or newest version

If you are using your own installation, note that while the config does explicitly load the necessary modules, you will often need to load them manually anyway. Please do so before each run as follows, or you may run into errors:

module purge
module load singularity/latest
module load graphviz/2.40.1
module load java/jdk11.0.10

Use of iGenomes

A local copy of the iGenomes resource with all commonly used genomes has been made available for all of Quest so you should be able to run the pipeline against any reference available in the igenomes.config specific to the nf-core pipeline. These files can be found at /projects/genomicsshare/AWS_iGenomes. You can do this by simply using the --genome <GENOME_ID> parameter. While you can technically “stream” genomes from iGenomes directly using the pipeline, this is a substantial use of bandwidth and resources on both ends, and potentially poses reproducibility issues later on. Please use the local copies unless absolutely necessary, and save your custom genomes to your personal allocation where necessary.

Config file

See config file on GitHub

nu_genomics.config
//Profile config names for nf-core/configs
params
{
    config_profile_description = 'Northwestern University Quest HPC (Genomics Nodes) config provided by nf-core/configs'
    config_profile_contact = 'Rogan Grant / Haley Carter / Janna Nugent (@RoganGrant, @hscarter, @NUjon)'
    config_profile_url = 'https://www.it.northwestern.edu/research/user-services/quest/'
    max_memory = 190.GB
    max_cpus = 40
    max_time = 240.h
    igenomes_base = "/projects/genomicsshare/AWS_iGenomes/references"
}
 
singularity
{
    enabled = true
    autoMounts = true
    cacheDir = "/projects/b1042/singularity_cache"
}
 
process
{
    beforeScript = 'module purge; module load singularity/latest; module load graphviz/2.40.1; module load java/jdk11.0.10'
    executor = 'slurm'
    queue = {task.memory >= 190.GB ? 'genomics-himem' : task.time >= 48.h ? 'genomicslong' : 'genomics'}
    clusterOptions = '-A b1042'
}
 
executor
{
    queueStatInterval = '5min'
    retry.delay = '1min'
    retry.maxAttempt = 5
    retry.maxDelay = '10min'
    pollInterval = '1min'
    queueSize = 50
    submitRateLimit = '10sec'
}