CalcUA - UAntwerp Tier-2 High Performance Computing Infrastructure (VSC)

NB: You will need an account to use the CalcUA VSC HPC cluster to run the pipeline.

Quickstart

To get started with running nf-core pipelines on CalcUA, you can use one of the example templates below. For more detailed info, see the extended explanations further below.

Slurm-scheduled pipeline

Example job_script.slurm to run the pipeline using the Slurm job scheduler to queue the individual tasks making up the pipeline. Note that the head nextflow process used to launch the pipeline does not need to request many resources.

#!/bin/bash -l
#SBATCH --partition=broadwell          # choose partition to run the nextflow head process on
#SBATCH --job-name=nextflow            # create a short name for your job
#SBATCH --nodes=1                      # node count
#SBATCH --cpus-per-task=1              # only 1 cpu cores is needed to run the nextflow head process
#SBATCH --mem-per-cpu=4G               # memory per cpu (4G is default for most partitions)
#SBATCH --time=00:02:00                # total run time limit (HH:MM:SS)
#SBATCH --account=<project-account>    # set project account
 
# Load the available Nextflow module.
module load Nextflow
 
# Or, if using a locally installed version of Nextflow, make Java available.
# module load Java
 
# Set Apptainer/Singularity environment variables to define caching and tmp
# directories. These are used during the conversion of Docker images to
# Apptainer/Singularity ones.
# These lines can be omitted if the variables are already set in your `~/.bashrc` file.
export APPTAINER_CACHEDIR="${VSC_SCRATCH}/apptainer/cache"
export APPTAINER_TMPDIR="${VSC_SCRATCH}/apptainer/tmp"
 
# Launch Nextflow head process.
# Provide a partition profile name to choose a particular partition queue, which
# will determine the available resources for each individual task in the pipeline.
# Note that the profile name ends with a `*_slurm` suffix, which indicates
# that this pipeline will submit each task to the Slurm job scheduler.
nextflow run nf-core/rnaseq \
  -profile test,vsc_calcua,broadwell_slurm \
  -with-report report.html \
  --outdir test_output
 
# Alternatively, use the generic slurm profile to let Nextflow submit tasks
# to different partitions, depending on their requirements.
nextflow run nf-core/rnaseq \
  -profile test,vsc_calcua,slurm \
  -with-report report.html \
  --outdir test_output

Running pipeline in a single Slurm job

Example job_script.slurm to run the pipeline on a single node in local execution mode, only making use of the resources allocated by sbatch.

#!/bin/bash -l
#SBATCH --partition=broadwell          # choose partition to run the nextflow head process on
#SBATCH --job-name=nextflow            # create a short name for your job
#SBATCH --nodes=1                      # node count
#SBATCH --cpus-per-task=28             # request a full node for local execution (broadwell nodes have 28 cpus)
#SBATCH --mem=112G                     # total memory (e.g., 112G max for broadwell) - can be omitted to use default (= max / # cores)
#SBATCH --time=00:02:00                # total run time limit (HH:MM:SS)
#SBATCH --account=<project-account>    # set project account
 
# Load the available Nextflow module.
module load Nextflow
 
# Or, if using a locally installed version of Nextflow, make Java available.
# module load Java
 
# Set Apptainer/Singularity environment variables to define caching and tmp
# directories. These are used during the conversion of Docker images to
# Apptainer/Singularity ones.
# These lines can be omitted if the variables are already set in your `~/.bashrc` file.
export APPTAINER_CACHEDIR="${VSC_SCRATCH}/apptainer/cache"
export APPTAINER_TMPDIR="${VSC_SCRATCH}/apptainer/tmp"
 
# Launch Nextflow head process.
# Provide a partition profile name to choose a particular partition queue, which
# will determine the available resources for each individual task in the pipeline.
# Note that the profile name ends with a `*_local` suffix, which indicates
# that this pipeline will run in local execution mode on the submitted node.
nextflow run nf-core/rnaseq \
  -profile test,vsc_calcua,broadwell_local \
  -with-report report.html \
  --outdir test_output

Step-by-step instructions

  1. Set the APPTAINER_CACHEDIR and APPTAINER_TMPDIR environment variables by adding the following lines to your .bashrc file (or simply add them to your Slurm job script):

    ```
    export APPTAINER_CACHEDIR="${VSC_SCRATCH}/apptainer/cache"
    export APPTAINER_TMPDIR="${VSC_SCRATCH}/apptainer/tmp"
    ```

    When using the ~/.bashrc method, you can ensure that the environment variables are available in your jobs by starting your scripts with the line #! /bin/bash -l, although this does not seem to be required (see below for more info).

  2. Load Nextflow in your job script via the command: module load Nextflow/23.04.2. Alternatively, when using your own version of Nextflow, use module load Java.

  3. Choose whether you want to run in local execution mode on a single node or make use of the Slurm job scheduler to queue individual pipeline tasks.

    • For Slurm scheduling, choose a partition profile ending in *_slurm. E.g., nextflow run pipeline -profile vsc_calcua,broadwell_slurm.
    • For local execution mode on a single node, choose a partition profile ending in *_local. E.g., nextflow run pipeline -profile vsc_calcua,broadwell_local.

    Note that the -profile option can take multiple values, the first one always being vsc_calcua and the second one a partition plus execution mode.

  4. Specify the partition that you want to run the pipeline on using the sbatch command’s --partition=<name> option and how many resources should be allocated. See the overview of partitions and their resources below, or refer to the CalcUA documentation for more info.

    • For Slurm scheduling, the partition on which the head process runs has no effect on the resources allocated to the actual pipeline tasks. The head process only requires minimal resources (e.g., 1 CPU and 4 GB RAM).
    • For local execution mode on a single node, the partition selected via sbatch must match the one selected with nextflow’s -profile option, otherwise the pipeline will not launch. It is probably convenient to simply request a full node (e.g., --cpus-per-task=28 and --mem=112G for broadwell). Omitting --mem-per-cpu or --mem will allocate the default memory value, which is the total available memory divided by the number of cores, e.g., 28 * 4 GB = 112 GB for broadwell (128 GB - 16 GB buffer).
  5. Submit the job script containing your full nextflow run command via sbatch or from an an interactive srun session launched via screen or tmux (to avoid the process from stopping when you disconnect your SSH session).


Location of output and work directory

By default, Nextflow stores all of the intermediate files required to run the pipeline in the work directory. It is generally recommended to delete this directory after the pipeline has finished successfully because it can get quite large, and all of the main output files will be saved in the results/ directory anyway. That’s why this config contains a cleanup command that removes the work directory automatically once the pipeline has completed successfully.

If the run does not complete successfully then the work directory should be removed manually to save storage space. The default work directory is set to $VSC_SCRATCH/work per this configuration. You can also use the nextflow clean command to clean up all files related to a specific run (including not just the work directory, but also log files and the .nextflow cache directory).

NB: The Nextflow work directory for any pipeline is located in $VSC_SCRATCH by default and is cleaned automatically after a success pipeline run, unless the debug profile is provided.

Debug mode

Debug mode can be enabled to always retain the work directory instead of cleaning it. To use it, pass debug as an additional value to the -profile option:

nextflow run <pipeline> -profile vsc_calcua,broadwell_local,debug

Note that this is a core config provided by nf-core pipelines, not something built into the VSC CalcUA config.

Availability of Nextflow

Nextflow has been made available on CalcUA as a module. You can find out which versions are available by using module av nextflow.

If you need to use a specific version of Nextflow that is not available, you can of course manually install it to your home directory and add the executable to your PATH:

curl -s https://get.nextflow.io | bash
mkdir -p ~/.local/bin/ && mv nextflow ~/.local/bin/

Before it can be used, you will still need to load the Java module in your job scripts: module load Java.

Overview of partition profiles and resources

NB: Aside from the profiles defined in the table below, one additional profile is available, named slurm. It automatically lets Nextflow choose the most appropriate Slurm partition to submit each pipeline task to based on the task’s requirements (CPU, memory and run time). Example usage: nextflow run -profile vsc_calcua,slurm.

The CalcUA config defines two types of profiles for each of the following partitions:

PartitionClusterProfile nameTypeMax memoryMax CPUMax wall timeExample usage
zen2Vaughanzen2_slurmSlurm scheduler240 GB (per task)64 (per task)3 daysnextflow run <pipeline> -profile vsc_calcua,zen2_slurm
zen2Vaughanzen2_localLocal node execution240 GB (or as requested)64 (or as requested)3 daysnextflow run <pipeline> -profile vsc_calcua,zen2_local
zen3Vaughanzen3_slurmSlurm scheduler240 GB (per task)64 (per task)3 daysnextflow run <pipeline> -profile vsc_calcua,zen3_slurm
zen3Vaughanzen3_localLocal node execution240 GB (or as requested)64 (or as requested)3 daysnextflow run <pipeline> -profile vsc_calcua,zen3_local
zen3_512Vaughanzen3_512_slurmSlurm scheduler496 GB (per task)64 (per task)3 daysnextflow run <pipeline> -profile vsc_calcua,zen3_512_slurm
zen3_512Vaughanzen3_512_localLocal node execution496 GB (or as requested)64 (or as requested)3 daysnextflow run <pipeline> -profile vsc_calcua,zen3_512_local
broadwellLeibnizbroadwell_slurmSlurm scheduler112 GB (per task)28 (per task)3 daysnextflow run <pipeline> -profile vsc_calcua,broadwell_slurm
broadwellLeibnizbroadwell_localLocal node execution112 GB (or as requested)28 (or as requested)3 daysnextflow run <pipeline> -profile vsc_calcua,broadwell_local
broadwell_256Leibnizbroadwell_256_slurmSlurm scheduler240 GB (per task)28 (per task)3 daysnextflow run <pipeline> -profile vsc_calcua,broadwell_256_slurm
broadwell_256Leibnizbroadwell_256_localLocal node execution240 GB (or as requested)28 (or as requested)3 daysnextflow run <pipeline> -profile vsc_calcua,broadwell_256_local
skylakeBreniac (formerly Hopper)skylake_slurmSlurm scheduler176 GB (per task)28 (per task)7 daysnextflow run <pipeline> -profile vsc_calcua,skylake_slurm
skylakeBreniac (formerly Hopper)skylake_localLocal node execution176 GB (or as requested)28 (or as requested)7 daysnextflow run <pipeline> -profile vsc_calcua,skylake_local
all/slurmSlurm scheduler///nextflow run <pipeline> -profile vsc_calcua,slurm

For more information on the difference between the *_slurm-type and *_local-type profiles, see below. Briefly,

  • Slurm profiles submit each pipeline task to the Slurm job scheduler using a particular partition.
    • The generic slurm profile also submits jobs to the Slurm job scheduler, but it can stage them across different partitions simultaneously depending on the tasks’ requirements.
  • Local profiles run pipeline tasks on the local node, using only the resource that were requested by sbatch (or srun in interactive mode).

The max memory for the Slurm partitions is set to the available amount of memory for each partition minus 16 GB (which is the amount reserved for the OS and file system buffers, see slide 63 of this CalcUA introduction course). For the local profiles the resources are set dynamically based on those requested by sbatch.

More information on the hardware differences between the partitions can be found on the CalcUA website and in the VSC documentation. You can also use the sinfo -o "%12P %.10A %.11l %D %c %m" command to see the available partitions yourself.

NB: Do not launch nextflow jobs directly from a login node. Not only will this occupy considerable resources on the login nodes (the nextflow master process/head job can still use considerable amounts of RAM, see https://nextflow.io/blog/2024/optimizing-nextflow-for-hpc-and-cloud-at-scale.html), but the command might get cancelled (since there is a wall time for the login nodes too).

Schedule Nextflow pipeline using Slurm

The *_slurm (and slurm) profiles allow Nextflow to use the Slurm job scheduler to queue each pipeline task as a separate job. The main job that you manually submit using sbatch will run the head Nextflow process (nextflow run ...), which acts as a governor and monitoring job, and spawn new Slurm jobs for the different tasks in the pipeline. Each task will request the appropriate amount of resources defined by the pipeline (up to a threshold set in the given partition’s profile) and will be run as an individual Slurm job. This means that each task will be placed in the scheduling queue individually and all the standard priority rules will apply to each of them.

The nextflow run ... command that launches the head process, can be invoked either via sbatch or from an an interactive srun session launched via screen or tmux (to avoid the process from stopping when you disconnect your SSH session), but it does NOT need to request the total amount of resources that would be required for the full pipeline!

NB: When using the slurm-type profiles, the initial job that launches the master nextflow process does not need many resources to run. Therefore, use the #SBATCH options to limit its requested to a small sensible amount (e.g., 2 CPUs and 4 GB RAM), regardless of how computationally intensive the actual pipeline is.

NB: The wall time of the Nextflow head process will ultimately determine how long the pipeline can run for.

Local Nextflow run on a single (interactive) node

In contrast to the *_slurm profiles, the *_local profiles instead run in Nextflow’s local execution mode, which means that they do not make use of the Slurm job scheduler. Instead, the head Nextflow process (nextflow run ...) will run on the allocated compute node and spawn all of sub-processes for the individual tasks in the pipeline on that same node (i.e., similar to running a pipeline on your own machine). The available resources are determined by the #SBATCH options passed to Slurm as usual and are shared among all tasks.

The nextflow run ... command that launches the head process, can be invoked either via sbatch or from an an interactive srun session launched via screen or tmux (to avoid the process from stopping when you disconnect your SSH session) and it DOES need to request the total amount of resources that are required by the full pipeline!

NB: When using one of the single node profiles, make sure that you launch the job on the same partition as the one specified by the -profile vsc_calcua,<partition> option of your nextflow run command, either by launching it from the matching login node or by using the sbatch option --partition=<partition>. E.g., a job script containing the following nextflow command: nextflow run <pipeline> -profile vsc_calcua,broadwell_local should be launched from a Leibniz login node or via the following sbatch command: sbatch --account <project_account> --partition broadwell script.slurm

NB: The single node profiles do not automatically set the pipeline’s CPU/RAM resource limits to those of a full node, but instead dynamically set them based on those allocated by Slurm, i.e. those requested via the sbatch. However, in many cases, it likely is a good idea to simply request a full node.

Apptainer / Singularity environment variables for cache and tmp directories

NB: The default directory where Nextflow will cache container images is $VSC_SCRATCH/apptainer/nextflow_cache.

NB: The recommended directories for apptainer/singularity’s cache and tmp directories are $VSC_SCRATCH/apptainer/cache (cache directory for images layers) and $VSC_SCRATCH/apptainer/tmp (temporary directory used during build or docker conversion) respectively, to avoid filling up your home storage and/or job node’s SSDs (since the default locations when unset are $HOME/.apptainer/cache and /tmp respectively).

Apptainer is an open-source fork of Singularity, which is an alternative container runtime to Docker. It is more suitable to usage on HPCs because it can be run without root privileges and does not use a dedicated daemon process. More info on the usage of Apptainer/Singularity on the VSC HPC can be found here.

When executing Nextflow pipelines using Apptainer/Singularity, the container image files will by default be cached inside the pipeline work directory. The CalcUA config profile instead sets the singularity.cacheDir setting to a central location on your scratch space ($VSC_SCRATCH/apptainer/nextflow_cache), in order to reuse them between different pipelines. This is equivalent to setting the NXF_APPTAINER_CACHEDIR/NXF_SINGULARITY_CACHEDIR environment variables manually (but note that the cacheDir defined in the config file takes precedence and cannot be overwritten by setting the environment variable).

Apptainer/Singularity makes use of two additional environment variables, APPTAINER_CACHEDIR/SINGULARITY_CACHEDIR and APPTAINER_TMPDIR/SINGULARITY_TMPDIR. As recommended by the VSC documentation on containers, these should be set to a location on the scratch system, to avoid exceeding the quota on your home directory file system.

NB: The cachedir and tmpdir are only used when new images are built or converted from existing docker images. For most nf-core pipelines this does not happen, since they will instead try to directly pull pre-built singularity images from Galaxy Depot

  • The cache directory APPTAINER_CACHEDIR/SINGULARITY_CACHEDIR is used to store files and layers used during image creation (or conversion of Docker/OCI images). Its default location is $HOME/.apptainer/cache, but it is recommended to change this to $VSC_SCRATCH/apptainer/cache on the CalcUA HPC instead.

  • The temporary directory APPTAINER_TMPDIR/SINGULARITY_TMPDIR is used to store temporary files when building an image (or converting a Docker/OCI source). The directory must have enough free space to hold the entire uncompressed image during all steps of the build process. Its default location is /tmp, but it is recommended to change this to $VSC_SCRATCH/apptainer/tmp on the CalcUA HPC instead. The reason being that the default /tmp would refer to a directory on the the compute node running the master nextflow process, which are small SSDs on CalcUA.

    NB: The tmp directory needs to be created manually beforehand, otherwise pipelines that need to pull in and convert docker images, or the manual building of images yourself, will fail.

Currently, Apptainer respects environment variables with either an APPTAINER or SINGULARITY prefix, but because support for the latter might be dropped in the future, the former variant is recommended.

These two variables can be set in several different ways:

  • Specified in your ~/.bashrc file (e.g., echo "export APPTAINER_CACHEDIR=${VSC_SCRATCH}/apptainer/cache APPTAINER_TMPDIR=${VSC_SCRATCH}/apptainer/tmp" >> ~/.bashrc) - recommended.
  • Passed to sbatch as a parameter or on a #SBATCH line in the job script (e.g., --export=APPTAINER_CACHEDIR=${VSC_SCRATCH}/apptainer/cache,APPTAINER_TMPDIR=${VSC_SCRATCH}/apptainer/tmp).
  • Directly in your job script (e.g., export APPTAINER_CACHEDIR=${VSC_SCRATCH}/apptainer/cache APPTAINER_TMPDIR=${VSC_SCRATCH}/apptainer/tmp).

However, note that for the .bashrc option to work, the environment need to be passed on to the slurm jobs. Currently, this seems to happen by default (i.e., variables defined in ~/.bashrc are propagated), but there exist methods to enforce this more strictly. E.g., job scripts that start with #!/bin/bash -l, will ensure that jobs launch using your login environment. Similarly, the sbatch options [--get-user-env](https://slurm.schedmd.com/sbatch.html#OPT_get-user-env) or --export= can be used. Also see the CalcUA-specific and the general VSC documentation for more info.

Lastly, note that this config file currently uses the Singularity engine rather than the Apptainer one (see singularity directive: enabled = true). The reason is that, for the time being, using the apptainer engine in nf-core pipelines will result in docker images being pulled and converted to apptainer ones, rather than making use of pre-built singularity images (see nf-core documentation). Conversely, when making use of the singularity engine, pre-built images are downloaded and Apptainer will still be used in the background for running these, since the installation of apptainer will by default create an alias for singularity (and this is also the case on CalcUA).

Troubleshooting

For general errors regarding the pulling of images, try clearing out the existing caches located in $VSC_SCRATCH/apptainer.

Failed to pull singularity image

FATAL: While making image from oci registry: error fetching image to cache: while building SIF from
layers: conveyor failed to get: while getting config: no descriptor found for reference
"139610e0c1955f333b61f10e6681e6c70c94357105e2ec6f486659dc61152a21"

Errors similar to the one above can be avoided by first downloading all required container images manually before running the pipeline. It seems like they could be caused by parallel downloads overwhelming the image repository (see issue).

To download a pipeline’s required images, use nf-core download <pipeline> --container-system singularity. See the nf-core docs for more info.

Config file

See config file on GitHub

vsc_calcua.config
// Define the scratch directory, which will be used for storing the nextflow
// work directory and for caching apptainer/singularity files.
// Default to /tmp directory if $VSC_SCRATCH scratch env is not available,
// see: https://github.com/nf-core/configs?tab=readme-ov-file#adding-a-new-config
def scratch_dir = System.getenv("VSC_SCRATCH") ?: "/tmp"
 
// Specify the work directory.
workDir = "$scratch_dir/work"
 
// Perform work directory cleanup when the run has succesfully completed.
cleanup = true
 
def host = System.getenv("VSC_INSTITUTE")
 
// Check if APPTAINER_TMPDIR/SINGULARITY_TMPDIR environment variables are set.
// If they are available, try to create the tmp directory at the specified location.
// Skip if host is not CalcUA to avoid hindering github actions.
if ( host == "antwerpen" ) {
    def apptainer_tmpdir = System.getenv("APPTAINER_TMPDIR") ?: System.getenv("SINGULARITY_TMPDIR") ?: null
    if (! apptainer_tmpdir ) {
        def tmp_dir = System.getenv("TMPDIR") ?: "/tmp"
        System.err.println("\nWARNING: APPTAINER_TMPDIR/SINGULARITY_TMPDIR environment variable was not found.\nPlease add the line 'export APPTAINER_TMPDIR=\"\${VSC_SCRATCH}/apptainer/tmp\"' to your ~/.bashrc file (or set it with sbatch or in your job script).\nDefaulting to local $tmp_dir on the execution node of the Nextflow head process.\n")
        // TODO: check if images stored there can be accessed by slurm jobs on other nodes
    } else {
        apptainer_tmpdir = new File(apptainer_tmpdir)
        if (! apptainer_tmpdir.exists() ) {
            try {
                dir_created = apptainer_tmpdir.mkdirs()
            } catch (java.io.IOException e) {
                System.err.println("\nWARNING: Could not create directory at the location specified by APPTAINER_TMPDIR/SINGULARITY_TMPDIR: $apptainer_tmpdir\nPlease check if this is a valid path to which you have write permission. Exiting...\n")
            }
        }
    }
}
 
// Function to check if the selected partition profile matches the partition on which the master
// nextflow job was launched (either implicitly or via `sbatch --partition=<partition-name>`).
// If the profile type is `*_local` and the partitions do not match, stop the execution and
// warn the user.
def partition_checker(String profile) {
    // Skip check if host machine is not CalcUA, in order to not hinder github actions.
    if ( host != "antwerpen" ) {
        // System.err.println("\nWARNING: Skipping comparison of current partition and requested profile because the current machine is not VSC CalcUA.")
        return
    }
 
    def current_partition = System.getenv("SLURM_JOB_PARTITION")
 
    try {
        current_partition
    } catch (java.io.IOException e) {
        System.err.println("\nWARNING: Current partition could not be found in the expected \$SLURM_JOB_PARTITION environment variable. Please make sure that you submit your pipeline via a Slurm job instead of running the nextflow command directly on a login node.\nExiting...\n")
    }
 
    try {
        current_partition = profile
    } catch (java.io.IOException e) {
        System.err.println("\nWARNING: Slurm job was launched on the \'$current_partition\' partition, but the selected nextflow profile points to the $profile partition instead ('${profile}_local'). When using one of the local node execution profiles, please launch the job on the corresponding partition in Slurm.\nE.g., Slurm job submission command:\n    sbatch --account <project_account> --partition=broadwell script.slurm\nand job script containing a nextflow command with matching profile section:\n    nextflow run <pipeline> -profile vsc_calcua,broadwell_local\nExiting...\n")
    }
}
 
// Reduce the job submit rate to about 30 per minute, this way the server
// won't be bombarded with jobs.
// Limit queueSize to keep job rate under control and avoid timeouts.
// Set read timeout to the maximum wall time.
// See: https://www.nextflow.io/docs/latest/config.html#scope-executor
executor {
    submitRateLimit = '30/1min'
    queueSize = 10
    exitReadTimeout = 7.day
}
 
// Add backoff strategy to catch cluster timeouts and proper symlinks of files in scratch
// to the work directory.
// See: https://www.nextflow.io/docs/latest/config.html#scope-process
process {
    stageInMode = "symlink"
    stageOutMode = "rsync"
    errorStrategy = { sleep(Math.pow(2, task.attempt ?: 1) * 200 as long); return 'retry' }
    maxRetries = 3
}
 
// Specify that apptainer/singularity should be used and where the cache dir will be for the images.
// The singularity directive is used in favour of the apptainer one, because currently the apptainer
// variant will pull in (and convert) docker images, instead of using pre-built singularity ones.
// To use the pre-built singularity containers instead, the singularity options should be selected
// with apptainer installed on the system, which defines singularity as an alias (as is the case
// on CalcUA).
// See https://nf-co.re/docs/usage/installation#pipeline-software
// and https://nf-co.re/tools#how-the-singularity-image-downloads-work
// See https://www.nextflow.io/docs/latest/config.html#scope-singularity
singularity {
    enabled = true
    autoMounts = true
    // See https://www.nextflow.io/docs/latest/singularity.html#singularity-docker-hub
    cacheDir = "$scratch_dir/apptainer/nextflow_cache"  // Equivalent to setting NXF_APPTAINER_CACHEDIR/NXF_SINGULARITY_CACHEDIR environment variable
}
 
// Define profiles for the following partitions:
// - zen2, zen3, zen3_512 (Vaughan)
// - broadwell, broadwell_256 (Leibniz)
// - skylake (Breniac, formerly Hopper)
// For each partition, there is a "*_slurm" profile and a "*_local" profile.
// The former uses the slurm executor to submit each nextflow task as a separate job,
// whereas the latter runs all tasks on the individual node on which the nextflow
// master process was launched.
// See: https://www.nextflow.io/docs/latest/config.html#config-profiles
profiles {
    // Automatic slurm partition selection based on task requirements
    slurm {
        params {
            config_profile_description = 'Slurm profile with automatic partition selection for use on the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware.html'
            max_memory = 496.GB // = max memory of high memory nodes
            max_cpus = 64   // = cpu count of largest nodes
            max_time = 7.day    // wall time of longest running nodes
        }
        process {
            executor = 'slurm'
            queue = {
                // long running
                if ( task.time > 3.day ) {
                    'skylake'
                // high memory
                } else if ( task.memory > 240.GB ) {
                    'zen3_512'
                // medium memory and high cpu
                } else if ( task.memory > 112.GB && task.cpus > 28 ) {
                    'zen2,zen3'
                // medium memory and lower cpu
                } else if ( task.memory > 112.GB && task.cpus < 28 ) {
                    'broadwell_256,zen2,zen3'
                // lower memory and high cpu
                } else if ( task.cpus > 28 ) {
                    'zen2,zen3'
                // lower memory and lower cpu
                } else {
                    'broadwell,skylake,zen2,zen3'
                }
            }
        }
    }
    // Vaughan partitions
    zen2_slurm {
        params {
            config_profile_description = 'Zen2 Slurm profile for use on the Vaughan cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/vaughan_hardware.html'
            max_memory = 240.GB // 256 GB (total) - 16 GB (buffer)
            max_cpus = 64
            max_time = 3.day
        }
        process {
            executor = 'slurm'
            queue = 'zen2'
        }
    }
    zen2_local {
        params {
            config_profile_description = 'Zen2 local profile for use on a single node of the Vaughan cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/vaughan_hardware.html'
            max_memory = get_allocated_mem(240) // 256 GB (total) - 16 GB (buffer)
            max_cpus = get_allocated_cpus(64)
            max_time = 3.day
        }
        process {
            executor = 'local'
        }
        partition_checker("zen2")
    }
    zen3_slurm {
        params {
            config_profile_description = 'Zen3 Slurm profile for use on the Vaughan cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/vaughan_hardware.html'
            max_memory = 240.GB // 256 GB (total) - 16 GB (buffer)
            max_cpus = 64
            max_time = 3.day
        }
        process {
            executor = 'slurm'
            queue = 'zen3'
        }
    }
    zen3_local {
        params {
            config_profile_description = 'Zen3 local profile for use on a single node of the Vaughan cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/vaughan_hardware.html'
            max_memory = get_allocated_mem(240) // 256 GB (total) - 16 GB (buffer)
            max_cpus = get_allocated_cpus(64)
            max_time = 3.day
        }
        process {
            executor = 'local'
        }
        partition_checker("zen3")
    }
    zen3_512_slurm {
        params {
            config_profile_description = 'Zen3_512 Slurm profile for use on the Vaughan cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/vaughan_hardware.html'
            max_memory = 496.GB // 512 GB (total) - 16 GB (buffer)
            max_cpus = 64
            max_time = 3.day
        }
        process {
            executor = 'slurm'
            queue = 'zen3_512'
        }
    }
    zen3_512_local {
        params {
            config_profile_description = 'Zen3_512 local profile for use on a single node of the Vaughan cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/vaughan_hardware.html'
            max_memory = get_allocated_mem(496) // 512 GB (total) - 16 GB (buffer)
            max_cpus = get_allocated_cpus(64)
            max_time = 3.day
        }
        process {
            executor = 'local'
        }
        partition_checker("zen3_512")
    }
    // Leibniz partitions
    broadwell_slurm {
        params {
            config_profile_description = 'Broadwell Slurm profile for use on the Leibniz cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/leibniz_hardware.html'
            max_memory = 112.GB // 128 GB (total) - 16 GB (buffer)
            max_cpus = 28
            max_time = 3.day
        }
        process {
            executor = 'slurm'
            queue = 'broadwell'
        }
    }
    broadwell_local {
        params {
            config_profile_description = 'Broadwell local profile for use on a single node of the Leibniz cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/leibniz_hardware.html'
            max_memory = get_allocated_mem(112) // 128 GB (total) - 16 GB (buffer)
            max_cpus = get_allocated_cpus(28)
            max_time = 3.day
        }
        process {
            executor = 'local'
        }
        partition_checker("broadwell")
    }
    broadwell_256_slurm {
        params {
            config_profile_description = 'Broadwell_256 Slurm profile for use on the Leibniz cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/leibniz_hardware.html'
            max_memory = 240.GB // 256 (total) - 16 GB (buffer)
            max_cpus = 28
            max_time = 3.day
        }
        process {
            executor = 'slurm'
            queue = 'broadwell_256'
        }
    }
    broadwell_256_local {
        params {
            config_profile_description = 'Broadwell_256 local profile for use on a single node of the Leibniz cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://docs.vscentrum.be/antwerp/tier2_hardware/leibniz_hardware.html'
            max_memory = get_allocated_mem(240) // 256 (total) - 16 GB (buffer)
            max_cpus = get_allocated_cpus(28)
            max_time = 3.day
        }
        process {
            executor = 'local'
        }
        partition_checker("broadwell_256")
    }
    // Breniac (previously Hopper) partitions
    skylake_slurm {
        params {
            config_profile_description = 'Skylake Slurm profile for use on the Breniac (former Hopper) cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://www.uantwerpen.be/en/research-facilities/calcua/infrastructure/'
            max_memory = 176.GB // 192 GB (total) - 16 GB (buffer)
            max_cpus = 28
            max_time = 7.day
        }
        process {
            executor = 'slurm'
            queue = 'skylake'
        }
    }
    skylake_local {
        params {
            config_profile_description = 'Skylake local profile for use on a single node of the Breniac (former Hopper) cluster of the CalcUA VSC HPC.'
            config_profile_contact = 'pmoris@itg.be (GitHub: @pmoris)'
            config_profile_url = 'https://www.uantwerpen.be/en/research-facilities/calcua/infrastructure/'
            max_memory = get_allocated_mem(176) // 192 GB (total) - 16 GB (buffer)
            max_cpus = get_allocated_cpus(28)
            max_time = 7.day
        }
        process {
            executor = 'local'
        }
        partition_checker("skylake")
    }
}
 
// Define functions to fetch the available CPUs and memory of the current execution node.
// Only used when running one of the *_local partition profiles and allows the cpu
// and memory thresholds to be set dynamic based on the available hardware as reported
// by Slurm. Can be supplied with a default return value, which should be set to the
// recommended thresholds for the particular partition's node types.
def get_allocated_cpus(int node_max_cpu) {
    max_cpus = System.getenv("SLURM_CPUS_PER_TASK") ?: System.getenv("SLURM_JOB_CPUS_PER_NODE") ?: node_max_cpu
    return max_cpus.toInteger()
}
def get_allocated_mem(int node_max_mem) {
    def mem_per_cpu = System.getenv("SLURM_MEM_PER_CPU")
    def cpus_per_task = System.getenv("SLURM_CPUS_PER_TASK") ?: System.getenv("SLURM_JOB_CPUS_PER_NODE")
 
    if ( mem_per_cpu && cpus_per_task ) {
        node_max_mem = mem_per_cpu.toInteger() / 1000 * cpus_per_task.toInteger()
    }
 
    return "${node_max_mem}.GB"
}