If you decide to upload a subworkflow to nf-core/modules
then this will ensure that it will become available to all nf-core pipelines, and to everyone within the Nextflow community! See subworkflows/nf-core/
for examples.
Terminology
The features offered by Nextflow DSL2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL2 components.
Module
A process
that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such as FastQC
. Atomic nf-core module files are available in the modules/
directory of nf-core/modules along with the required documentation and tests.
Subworkflow
A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a subworkflow to run multiple QC tools with FastQ files as input. Subworkflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. Shareable nf-core subworkflow files are available in the subworkflow/
directory of nf-core/modules along with the required documentation and tests.
Workflow
What DSL1 users would consider an end-to-end pipeline. For example, from one or more inputs to a series of outputs. This can either be implemented using a large monolithic script as with DSL1, or by using a combination of DSL2 modules and sub-workflows. nf-core pipelines can have multiple workflows, such as processing different data types for the same ultimate purpose (such as in nf-core/viralrecon)
Before you start
Please check that the subworkflow you wish to add isn’t already in the nf-core/modules
repository:
- Use the
nf-core subworkflows list
command - Check open pull requests
- Search open issues
If the subworkflow doesn’t exist on nf-core/modules
:
- Please create a new issue before adding it
- Set an appropriate subject for the issue e.g.
new subworkflow: bam_sort_stats_samtools
- Add yourself to the
Assignees
so we can track who is working on the subworkflow
Adding a new subworkflow
We have implemented a number of commands in the nf-core/tools
package to make it incredibly easy for you to create and contribute your own subworkflow to nf-core/modules.
-
Install the latest version of
nf-core/tools
(>=2.7
) -
Install
Nextflow
(>=21.10.3
) -
Install any of
Docker
,Singularity
orConda
-
Set up git on your computer by adding a new git remote of the main nf-core git repo called
upstream
git remote add upstream https://github.com/nf-core/modules.git
git remote add upstream https://github.com/nf-core/modules.git
Make a new branch for your subworkflow and check it out
git checkout -b bam_sort_stats_samtools
git checkout -b bam_sort_stats_samtools
-
Create a subworkflow using the nf-core DSL2 subworkflow template in the root of the clone of the nf-core/modules repository:
$ nf-core subworkflows create bam_sort_stats_samtools --author @joebloggs ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 2.8 - https://nf-co.re INFO Repository type: modules INFO Press enter to use default values (shown in brackets) or type your own responses. ctrl+click underlined text to open links. INFO Created / edited following files: ./subworkflows/nf-core/bam_sort_stats_samtools/main.nf ./subworkflows/nf-core/bam_sort_stats_samtools/meta.yml ./tests/subworkflows/nf-core/bam_sort_stats_samtools/main.nf ./tests/subworkflows/nf-core/bam_sort_stats_samtools/test.yml ./tests/subworkflows/nf-core/bam_sort_stats_samtools/nextflow.config ./tests/config/pytest_modules.yml
$ nf-core subworkflows create bam_sort_stats_samtools --author @joebloggs ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 2.8 - https://nf-co.re INFO Repository type: modules INFO Press enter to use default values (shown in brackets) or type your own responses. ctrl+click underlined text to open links. INFO Created / edited following files: ./subworkflows/nf-core/bam_sort_stats_samtools/main.nf ./subworkflows/nf-core/bam_sort_stats_samtools/meta.yml ./tests/subworkflows/nf-core/bam_sort_stats_samtools/main.nf ./tests/subworkflows/nf-core/bam_sort_stats_samtools/test.yml ./tests/subworkflows/nf-core/bam_sort_stats_samtools/nextflow.config ./tests/config/pytest_modules.yml
All of the files required to add the subworkflow to nf-core/modules
will be created/edited in the appropriate places. There are at most 5 files to modify:
-
./subworkflows/nf-core/bam_sort_stats_samtools/main.nf
This is the main script containing the
workflow
definition for the subworkflow. You will see an extensive number ofTODO
statements to help guide you to fill in the appropriate sections and to ensure that you adhere to the guidelines we have set for module submissions. -
./subworkflows/nf-core/bam_sort_stats_samtools/meta.yml
This file will be used to store general information about the subworkflow and author details. You will need to add a brief description of the files defined in the
input
andoutput
section of the main script since these will be unique to each subworkflow. -
./tests/subworkflows/nf-core/bam_sort_stats_samtools/main.nf
Every subworkflow MUST have a test workflow. This file will define one or more Nextflow
workflow
definitions that will be used to unit test the output files created by the subworkflow. By default, oneworkflow
definition will be added but please feel free to add as many as possible so we can ensure that the subworkflow works on different data types / parameters e.g. separateworkflow
for single-end and paired-end data.When writing multiple tests, a common practice is to alias process names to differentiate them between tests. When using an alias, add a suffix to the process name so the CI tests can still find the output in the folder named after the tool, e.g.
include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_SINGLE_END } from '../../../../subworkflows/nf-core/bam_sort_stats_samtools/main' // Good: Output folder is still 'fastqc' include { BAM_SORT_STATS_SAMTOOLS as SINGLE_END_BAM_SORT_STATS_SAMTOOLS } from '../../../../subworkflows/nf-core/bam_sort_stats_samtools/main' // Bad: Generates problems with CI tests - Output folder is 'post'
include { BAM_SORT_STATS_SAMTOOLS as BAM_SORT_STATS_SAMTOOLS_SINGLE_END } from '../../../../subworkflows/nf-core/bam_sort_stats_samtools/main' // Good: Output folder is still 'fastqc' include { BAM_SORT_STATS_SAMTOOLS as SINGLE_END_BAM_SORT_STATS_SAMTOOLS } from '../../../../subworkflows/nf-core/bam_sort_stats_samtools/main' // Bad: Generates problems with CI tests - Output folder is 'post'
Minimal test data required for your subworkflow may already exist within the nf-core/modules repository, in which case you may just have to change a couple of paths in this file - see the Test data section for more info and guidelines for adding new standardised data if required.
-
./tests/subworkflows/nf-core/bam_sort_stats_samtools/nextflow.config
Some subworkflows MAY require additional parameters added to the test command to successfully run. These can be specified with an
ext.args
variable within the process scope of thenextflow.config
file that exists alongside the test files themselves (and is automatically loaded when the test workflowmain.nf
is executed). -
./tests/subworkflows/nf-core/bam_sort_stats_samtools/test.yml
This file will contain all of the details required to unit test the main script in the point above using pytest-workflow. If possible, any outputs produced by the test workflow(s) MUST be included and listed in this file along with an appropriate check e.g. md5sum. The different test options are listed in the pytest-workflow docs.
As highlighted in the next point, we have added a command to make it much easier to test the workflow(s) defined for the subworkflow and to automatically create the
test.yml
with the md5sum hashes for all of the outputs generated by the subworkflow.md5sum
checks are the preferable choice of test to determine file changes, however, this may not be possible for all outputs generated by some tools e.g. if they include time stamps or command-related headers. Please do your best to avoid just checking for the file being present e.g. it may still be possible to check that the file contains the appropriate text snippets. -
Create a yaml file containing information required for subworkflow unit testing
$ nf-core subworkflows create-test-yml bam_sort_stats_samtools ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 2.8 - https://nf-co.re INFO Press enter to use default values (shown in brackets) or type your own responses Test YAML output path (- for stdout) (tests/subworkflows/nf-core/bam_sort_stats_samtools/test.yml): INFO Looking for test workflow entry points: 'tests/subworkflows/nf-core/bam_sort_stats_samtools/main.nf' INFO Building test meta for entry point 'test_bam_sort_stats_samtools_single_end' Test name (bam_sort_stats_samtools test_bam_sort_stats_samtools_single_end): Test command (nextflow run ./tests/subworkflows/nf-core/bam_sort_stats_samtools -entry test_bam_sort_stats_samtools_single_end -c ./tests/config/nextflow.config): Test tags (comma separated): Test output folder with results (leave blank to run test): ? Choose software profile Docker INFO Setting env var '$PROFILE' to 'docker' INFO Running 'bam_sort_stats_samtools' test with command: nextflow run ./tests/subworkflows/nf-core/bam_sort_stats_samtools -entry test_bam_sort_stats_samtools_single_end -c ./tests/config/nextflow.config --outdir /var/folders/lt/b3cs9y610fg_13q14dckwcvm0000gn/T/tmping28ow_ -work-dir /var/folders/lt/b3cs9y610fg_13q14dckwcvm0000gn/T/tmportf0uab
$ nf-core subworkflows create-test-yml bam_sort_stats_samtools ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 2.8 - https://nf-co.re INFO Press enter to use default values (shown in brackets) or type your own responses Test YAML output path (- for stdout) (tests/subworkflows/nf-core/bam_sort_stats_samtools/test.yml): INFO Looking for test workflow entry points: 'tests/subworkflows/nf-core/bam_sort_stats_samtools/main.nf' INFO Building test meta for entry point 'test_bam_sort_stats_samtools_single_end' Test name (bam_sort_stats_samtools test_bam_sort_stats_samtools_single_end): Test command (nextflow run ./tests/subworkflows/nf-core/bam_sort_stats_samtools -entry test_bam_sort_stats_samtools_single_end -c ./tests/config/nextflow.config): Test tags (comma separated): Test output folder with results (leave blank to run test): ? Choose software profile Docker INFO Setting env var '$PROFILE' to 'docker' INFO Running 'bam_sort_stats_samtools' test with command: nextflow run ./tests/subworkflows/nf-core/bam_sort_stats_samtools -entry test_bam_sort_stats_samtools_single_end -c ./tests/config/nextflow.config --outdir /var/folders/lt/b3cs9y610fg_13q14dckwcvm0000gn/T/tmping28ow_ -work-dir /var/folders/lt/b3cs9y610fg_13q14dckwcvm0000gn/T/tmportf0uab
NoteSee docs for running tests manually if you would like to run the tests manually.
-
Run
prettier
on all edited and generated files prettier -w . -
Check that the new subworkflow you’ve added follows the new subworkflow guidelines
(COMMAND NOT IMPLEMENTED IN NF-CORE/TOOLS YET!!) Lint the subworkflow locally to check that it adheres to nf-core guidelines before submission
<!-- TODO: nf-core: Update these guidelines as we develop them -->
```console
$ nf-core subworkflows lint
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 2.7.dev0 - https://nf-co.re
INFO Repository type: modules
```
9. Once ready, the code can be pushed and a pull request (PR) created
On a regular basis you can pull upstream changes into this branch and it is recommended to do so before pushing and creating a pull request - see below. Rather than merging changes directly from upstream the rebase strategy is recommended so that your changes are applied on top of the latest master branch from the nf-core repo. This can be performed as follows:
git pull --rebase upstream master
git pull --rebase upstream master
Once you are ready you can push the code and create a PR
git push -u origin bam_sort_stats_samtools
git push -u origin bam_sort_stats_samtools
Once the PR has been accepted you should delete the branch and checkout master again.
git checkout master
git branch -d bam_sort_stats_samtools
git checkout master
git branch -d bam_sort_stats_samtools
In case there are commits on the local branch that didn’t make it into the PR (usually commits made after the PR), git will warn about this and not delete the branch. If you are sure you want to delete, use the following command
git branch -D bam_sort_stats_samtools
git branch -D bam_sort_stats_samtools
Test data
In order to test that each subworkflow added to nf-core/modules
is actually working and to be able to track any changes to results files between subworkflow updates we have set-up a number of Github Actions CI tests to run each subworkflow on a minimal test dataset using Docker, Singularity and Conda.
-
All test data for the
nf-core/modules
repository MUST be added to themodules
branch ofnf-core/test-datasets
and organised by filename extension. -
In order to keep the size of the test data repository as minimal as possible, pre-existing files from
nf-core/test-datasets
MUST be reused if at all possible. -
Test files MUST be kept as tiny as possible.
-
If the appropriate test data doesn’t exist in the
modules
branch ofnf-core/test-datasets
please contact us on the nf-core Slack#subworkflows
channel (you can join with this invite) to discuss possible options. -
It may not be possible to add test data for some subworkflows e.g. if the input data is too large or requires a local database. In these scenarios, it is recommended to use the Nextflow
stub
feature to test the subworkflow. Please refer to thegtdbtk/classify
module and its corresponding test script to understand how to use this feature for your subworkflow development.
Running tests manually
As outlined in the nf-core subworkflows create section we have made it quite trivial to create an initial yaml file (via the nf-core subworkflows create-test-yml
command) containing a listing of all of the subworkflow output files and their associated md5sums. However, md5sum checks may not be appropriate for all output files if for example they contain timestamps. This is why it is a good idea to re-run the tests locally with pytest-workflow
before you create your pull request adding the subworkflow. If your files do indeed have timestamps or other issues that prevent you from using the md5sum check, then you can edit the test.yml
file to instead check that the file contains some specific content or as a last resort, if it exists. The different test options are listed in the pytest-workflow docs.
Please follow the steps below to run the tests locally:
-
Install
Nextflow
(>=21.10.3
) -
Install any of
Docker
,Singularity
orConda
-
Install
pytest-workflow
-
Start running your own tests using the appropriate
tag
defined in thetest.yml
:- Run the test with the helper tool
nf-core subworkflows test
from the modules directory.
$ cd /path/to/git/clone/of/nf-core/modules/ $ nf-core subworkflows test bam_sort_stats_samtools ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 2.8 - https://nf-co.re INFO Press enter to use default values (shown in brackets) or type your own responses ? Choose software profile Docker INFO Setting environment variable '$PROFILE' to 'conda' NFO Running pytest for subworkflow 'bam_sort_stats_samtools'
$ cd /path/to/git/clone/of/nf-core/modules/ $ nf-core subworkflows test bam_sort_stats_samtools ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 2.8 - https://nf-co.re INFO Press enter to use default values (shown in brackets) or type your own responses ? Choose software profile Docker INFO Setting environment variable '$PROFILE' to 'conda' NFO Running pytest for subworkflow 'bam_sort_stats_samtools'
- See docs on running pytest-workflow for more info.
- Run the test with the helper tool
For docker/singularity, setting the environment variable TMPDIR=~
is an example of a location the containers can mount (you can change this as you prefer). If you get test failures such as with Nextflow errors that end in work doesn't exist in container
, check your container can mount your TMPDIR
.
Uploading to nf-core/modules
Fork the nf-core/modules
repository to your own GitHub account. Within the local clone of your fork add the subworkflow files to the subworkflows/
directory. Please try and keep PRs as atomic as possible to aid the reviewing process - ideally, one subworkflow addition/update per PR.
Commit and push these changes to your local clone on GitHub, and then create a pull request on the nf-core/modules
GitHub repo with the appropriate information.
When you are happy with your pull request, please select the Ready for Review
label on the GitHub PR tab, and providing that everything adheres to nf-core guidelines we will endeavour to approve your pull request as soon as possible. We also recommend to request reviews from the nf-core/maintainers-team
so a core team of volunteers can try to review your PR as fast as possible.
Once you are familiar with the subworkflow submission process, please consider joining the reviewing team by asking on the #subworkflows
Slack channel.
Talks
these may include references to an older syntax, however the general idea remains the same
New subworkflow guidelines and PR review checklist
The key words “MUST”, “MUST NOT”, “SHOULD”, etc. are to be interpreted as described in RFC 2119.
General
-
Subworkflows should combine tools that make up a logical unit in an analysis step. A subworkflow must contain at least two modules.
-
Each
subworkflow
emits a channel containing allversions.yml
collecting the tool(s) versions. They MUST be collected within the workflow and added to the output asversions
:
take:
input
main:
ch_versions = Channel.empty()
FASTQC(input)
ch_versions = ch_versions.mix(FASTQC.out.versions())
emit:
versions = ch_versions
take:
input
main:
ch_versions = Channel.empty()
FASTQC(input)
ch_versions = ch_versions.mix(FASTQC.out.versions())
emit:
versions = ch_versions
Naming conventions
-
The directory structure for the subworkflow name must be all lowercase e.g.
subworkflows/nf-core/bam_sort_stats_samtools/
. The naming convention should be of the format<file_type>_<operation_1>_<operation_n>_<tool_1>_<tool_n>
e.g.bam_sort_stats_samtools
wherebam
=<file_type>
,sort
=<operation>
andsamtools
=<tool>
. Not all operations are required in the name if they are routine (e.g. indexing after creation of a BAM). Operations can be collapsed to a general name if the steps are directly related to each other. For example if in a subworkflow, a binning tool has three required steps (e.g.<tool> split
,<tool> calculate
,<tool> merge
) to perform an operation (contig binning) these can be collapsed into one (e.g.fasta_binning_concoct
, rather thanfasta_split_calculate_merge_concoct
). If in doubt regarding what to name your subworkflow, please contact us on the nf-core Slack#subworkflows
channel (you can join with this invite) to discuss possible options. -
All parameter names MUST follow the
snake_case
convention. -
All function names MUST follow the
camelCase
convention. -
Channel names MUST follow
snake_case
convention and be all lower case. -
Input channel names SHOULD signify the input object type. For example, a single value input channel will be prefixed with
val_
, whereas input channels with multiple elements (e.g. meta map + file) should be prefixed withch_
. -
Output channel names SHOULD only be named based on the major output file of that channel (i.e, an output channel of
[[meta], bam]
should be emitted asbam
, notch_bam
). This is for more intuitive use of these output objects downstream with the.out
attribute.
Input/output options
-
Input channel declarations MUST be defined for all possible input files that will be required by the subworkflow (i.e. both required and optional files) within the
take
block. -
Named file extensions MUST be emitted for ALL output channels e.g.
path "*.txt", emit: txt
. -
Optional inputs are not currently supported by Nextflow. However, passing an empty list (
[]
) instead of a file as a subworkflow parameter can be used to work around this issue.
Subworkflow parameters
- Named
params
defined in the parent workflow MUST NOT be assumed to be passed to the subworkflow to allow developers to call their parameters whatever they want. In general, it may be more suitable to use additionalinput
value channels to cater for such scenarios.
Documentation
-
Each input and output channel SHOULD have a comment describing the output structure of the channel e.g
input: ch_reads // channel: [mandatory] meta, reads val_sort // boolean: [mandatory] false <...> emit: bam = SAMTOOLS_VIEW.out.bam // channel: [ val(meta), path(bam) ] versions = ch_versions // channel: [ path(versions.yml) ]
input: ch_reads // channel: [mandatory] meta, reads val_sort // boolean: [mandatory] false <...> emit: bam = SAMTOOLS_VIEW.out.bam // channel: [ val(meta), path(bam) ] versions = ch_versions // channel: [ path(versions.yml) ]
-
Each input and output channel structure SHOULD also be described in the
meta.yml
in the description entry.description: | Structure: [ val(meta), path(tsv)] (Sub)contig coverage table
description: | Structure: [ val(meta), path(tsv)] (Sub)contig coverage table
Publishing results
This system uses Nextflow’s native publishDir
defined directly in a pipeline workflow’s modules.config
(see here for a simple example)
Test data config file
If a new test dataset is added to tests/config/test_data.config
, check that the config name of the added file(s) follows the scheme of the entire file name with dots replaced with underscores.
For example: the nf-core/test-datasets file genomics/sarscov2/genome/genome.fasta
labelled as genome_fasta
, or genomics/sarscov2/genome/genome.fasta.fai
as genome_fasta_fai
.
Using a stub test when required test data is too big
If the subworkflow absolutely cannot run using tiny test data, there is a possibility to add stub-run to the test.yml
. In this case it is required to test the subworkflow using larger scale data and document how this is done. In addition, an extra script-block labeled stub:
must be added, and this block must create dummy versions of all expected output files as well as the versions.yml
. An example for modules is found in the ascat module. In the test.yml
the -stub-run
argument is written as well as the md5sums for each of the files that are added in the stub-block. This causes the stub-code block to be activated when the unit test is run (example):
nextflow run tests/subworkflows/nf-core/<name_of_subworkflow> -entry test_<name_of_subworkflow> -c tests/config/nextflow.config -stub-run
nextflow run tests/subworkflows/nf-core/<name_of_subworkflow> -entry test_<name_of_subworkflow> -c tests/config/nextflow.config -stub-run
What is the meta
map?
In nf-core DSL2 pipelines, to add sample-specific information and metadata that is carried throughout the pipeline, we use a meta variable. This avoids the need to create separate channels for each new characteristic.
The meta variable can be passed down to processes as a tuple of the channel containing the actual samples, e.g. FastQ files, and the meta variable. The meta map
is a groovy map, which is like a python dictionary.
Help
For further information or help, don’t hesitate to get in touch on Slack #subworkflows
channel (you can join with this invite).