If you decide to upload a module to nf-core/modules
then this will ensure that it will become available to all nf-core pipelines, and to everyone within the Nextflow community! See modules/
for examples.
See the dsl2 modules tutorial for a step by step guide for how to add a module!
DSL1 has reached it’s end-of-life as of 2023. As of Nextflow version 22.04.x and 22.10.x it will not be possible to run DSL1 scripts.
Terminology
A domain-specific language (DSL) is a programming language that is developed for a specific application. Nextflow is based on a DSL, where DSL2 is the latest version. DSL2 allows data analysis pipelines to be scaled and modularised. The features offered by Nextflow DSL2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL2 components.
Module
A process
that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such as FastQC
. Atomic nf-core module files are available in the modules/
directory of nf-core/modules along with the required documentation and tests.
Subworkflow
A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a subworkflow to run multiple QC tools with FastQ files as input. Subworkflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. Shareable nf-core subworkflow files are available in the subworkflow/
directory of nf-core/modules along with the required documentation and tests.
Workflow
An end-to-end pipeline where one or more inputs produce a series of outputs. This can either be implemented using a large monolithic script or by using a combination of DSL2 modules and subworkflows. nf-core pipelines can have multiple workflows, such as processing different data types for the same ultimate purpose (such as in nf-core/viralrecon)
Writing a new module reference
See the dsl2 modules tutorial for a step by step guide for how to add a module!
Before you start
Please check that the module you wish to add isn’t already on nf-core/modules
:
- Use the
nf-core modules list
command - Check open pull requests
- Search open issues
If the module doesn’t exist on nf-core/modules
:
- Please create a new issue before adding it
- Set an appropriate subject for the issue e.g.
new module: fastqc
- Add yourself to the
Assignees
so we can track who is working on the module
New module workflow
We have implemented a number of commands in the nf-core/tools
package to make it incredibly easy for you to create and contribute your own modules to nf-core/modules.
- Install any of
Docker
,Singularity
orConda
If you use the conda package manager you can setup a new environment and install all dependencies for the new module workflow in one step with:
and proceed with Step 5.
-
Install
Nextflow
(>=21.04.0
) -
Install the latest version of
nf-core/tools
(>=2.7
) -
Install
nf-test
-
Setup up pre-commit (comes packaged with
nf-core/tools
, watch the pre-commit bytesize talk if you want to know more about it) to ensure that your code is linted and formatted correctly before you commit it to the repository -
Set up git on your computer by adding a new git remote of the main nf-core git repo called
upstream
Make a new branch for your module and check it out
-
Create a module using the nf-core DSL2 module template:
All of the files required to add the module to
nf-core/modules
will be created/edited in the appropriate places. There are at most 5 files to modify:-
./modules/nf-core/fastqc/main.nf
This is the main script containing the
process
definition for the module. You will see an extensive number ofTODO
statements to help guide you to fill in the appropriate sections and to ensure that you adhere to the guidelines we have set for module submissions. -
./modules/nf-core/fastqc/meta.yml
This file will be used to store general information about the module and author details - the majority of which will already be auto-filled. However, you will need to add a brief description of the files defined in the
input
andoutput
section of the main script since these will be unique to each module. We check it’s formatting and validity based on a JSON schema during linting (and in the pre-commit hook). -
./modules/nf-core/fastqc/tests/main.nf.test
Every module MUST have a test workflow. This file will define one or more Nextflow
workflow
definitions that will be used to unit test the output files created by the module. By default, oneworkflow
definition will be added but please feel free to add as many as possible so we can ensure that the module works on different data types / parameters e.g. separateworkflow
for single-end and paired-end data.Minimal test data required for your module may already exist within the nf-core/modules repository, in which case you may just have to change a couple of paths in this file - see the Test data section for more info and guidelines for adding new standardised data if required.
Refer to the section writing nf-test tests for more information on how to write nf-tests
-
-
Create a snapshot of the tests
NoteSee the nf-test docs if you would like to run the tests manually.
-
Check that the new module you’ve added follows the new module guidelines
-
Run
prettier
on all edited and generated files: -
Lint the module locally to check that it adheres to nf-core guidelines before submission
-
Once ready, the code can be pushed and a pull request (PR) created
On a regular basis you can pull upstream changes into this branch and it is recommended to do so before pushing and creating a pull request - see below. Rather than merging changes directly from upstream the rebase strategy is recommended so that your changes are applied on top of the latest master branch from the nf-core repo. This can be performed as follows
Once you are ready you can push the code and create a PR
Once the PR has been accepted you should delete the branch and checkout master again.
In case there are commits on the local branch that didn’t make it into the PR (usually commits made after the PR), git will warn about this and not delete the branch. If you are sure you want to delete, use the following command
Test data
In order to test that each module added to nf-core/modules
is actually working and to be able to track any changes to results files between module updates we have set-up a number of Github Actions CI tests to run each module on a minimal test dataset using Docker, Singularity and Conda.
-
All test data for
nf-core/modules
MUST be added to themodules
branch ofnf-core/test-datasets
and organised by filename extension. -
Please adhere to the test-data specifications when adding new test-data
-
In order to keep the size of the test data repository as minimal as possible, pre-existing files from
nf-core/test-datasets
MUST be reused if at all possible. -
Test files MUST be kept as tiny as possible.
-
If the appropriate test data doesn’t exist in the
modules
branch ofnf-core/test-datasets
please contact us on the nf-core Slack#modules
channel (you can join with this invite) to discuss possible options. -
It may not be possible to add test data for some modules e.g. if the input data is too large or requires a local database. In these scenarios, it is recommended to use the Nextflow
stub
feature to test the module. Please refer to thegtdbtk/classify
module and its corresponding test script to understand how to use this feature for your module development.
If a new test dataset is added to tests/config/test_data.config
, check that the config name of the added file(s) follows the scheme of the entire file name with dots replaced with underscores.
For example: the nf-core/test-datasets file genomics/sarscov2/genome/genome.fasta
labelled as genome_fasta
, or genomics/sarscov2/genome/genome.fasta.fai
as genome_fasta_fai
.
Publishing results
Results are published using Nextflow’s native publishDir
directive defined in the modules.config
of a workflow (see here for an example.) Results were earlier published using a custom publishDir
definition, using a Groovy Map defined by params.modules
.
Using a stub test when required test data is too big
If the module absolute cannot run using tiny test data, there is a possibility to add stub-run to the test.yml. In this case it is required to test the module using larger scale data and document how this is done. In addition, an extra script-block labeled stub:
must be added, and this block must create dummy versions of all expected output files as well as the versions.yml
. An example is found in the ascat module. In the test.yml
the -stub-run
argument is written as well as the md5sums for each of the files that are added in the stub-block. This causes the stub-code block to be activated when the unit test is run (example):
Writing nf-test tests
We recently decided to use nf-test instead of pytest for testing modules. This is because nf-test is more flexible and allows us to test modules in a more realistic way. You can find more information at nf-test official docs and in this bytesize talk.
A simple example of a nf-test directory in nf-core/modules can be found here.
Philosophy of nf-tests
- Each module contains a
tests/
folder beside themain.nf
of the module itself, containing the test files - Test files come with a snapshot of module output channels
nf-test guidelines for a simple un-chained module
- Some modules MAY require additional parameters added to the test command to successfully run. These can be specified with an
ext.args
variable within the process scope of thenextflow.config
file that exists alongside the test files themselves (and is automatically loaded when the test workflowmain.nf
is executed).
If your module requires a a nextflow.config
file to run, create the file to the module’s tests/
directory and add the additional parameters there.
Then add the path to the main.nf.test
file.
- When your test data is too big, the tests take too long or require too much resources, you can opt to run your tests in stub mode by adding the following option:
this can be added at the top of main.nf.test
to have all tests run in stub mode or this can also be added to a single test
- You can find examples of different nf-tests assertions on this tutorial.
nf-test guidelines for a chained module
-
For modules that involve running more than one process to generate required test-data (aka chained modules), nf-test provides a setup method.
-
For example, the module
abricate/summary
requires the processabricate/run
to be run prior and takes its output as input. Thesetup
method is to be declared before the primarywhen
block in the test file as shown below:
The setup method can run more than one process each enclosed in their own run
block
- Then, the output of setup process/es can be provided as input in the
process
section ofwhen
block
- Next, in the
then
block we can write our assertions that are used to verify the test. A test can have multiple assertions but, we recommend enclosing all assertions in aassertAll()
block as shown below:
- the
main.nf.test
file for chained modules will finally look as shown below:
Migrating from pytest to nf-test
Steps for creating nf-test for a simple un-chained module
- Git checkout a new branch for your module tests.
To create the necessary files for nf-test and ensure a smooth transition, we will use the template provided by nf-core/tools.
Here are the steps to follow:
- Use nf-core/tools to create a new module with the same name as the old one with the option
--migrate-pytest
. This command will rename the current module directory to<module>_old
to avoid conflicts with the new module, create a new module, and copy themain.nf
,meta.yml
andenvironment.yml
files over to preserve the original module code.
- (optional) If your module has a
nextflow.config
file to run (e.g. forext.args
specification), the command will also copy it to the module’stests/
directory and the path will be added to themain.nf.test
file.
-
When using the
--migrate-pytest
option you will be asked if you want to delete the old module directory and see the content of the old pytests in the terminal, or to keep the old module directory. For the following steps, use the information from the pytest tests to create the new nf-test tests. -
Provide a test name preferably indicating the test-data and file-format used. Example:
test("homo_sapiens - [bam, bai, bed] - fasta - fai")
multiple tests are allowed in a single test file
- If migrating an existing module, get the inputs from current pytest files
tests/modules/nf-core/module/main.nf
and provide as positional inputsinput[0]
in nf-test file
- Next, in the
then
block we can write our assertions that are used to verify the test. A test can have multiple assertions but, we recommend enclosing all assertions in aassertAll()
block as shown below:
- Run the test to create a snapshot of your module test. This will create a
main.nf.test.snap
file
If you chose to not remove the old module directory with nf-core/tools:
-
Remove the corresponding tags from
tests/config/pytest_modules.yml
so that py-tests for the module will be skipped on github CI. -
Remove the corresponding pytest files in
tests/modules/nf-core
- Remove the old module
- Check if everything is according to the nf-core guidelines with:
- create a PR and add the
nf-test
label to it.
Steps for creating nf-test for chained modules
-
Follow the steps listed above for simple modules for test generation, tags and test-name
-
Refer to the section nf-test guidelines for a chained module
-
Run the test to create a snapshot of your module test. This will create a
.nf.test.snap
file
- Add the corresponding module tag from
tests/config/pytest_modules.yml
to thetags.yml
inmodules/nf-core/<module>/tests/
.
Remove the corresponding tags from tests/config/pytest_modules.yml
so that py-tests for the module will be skipped on github CI
- create PR and add the
nf-test
label to it.
The implementation of nf-test in nf-core is still evolving. Things might still change and the information might here might be outdated. Please report any issues you encounter on the nf-core/website repository and the nf-test
channel on nf-core slack.
Uploading to nf-core/modules
Fork the nf-core/modules
repository to your own GitHub account. Within the local clone of your fork add the module file to the modules/
directory. Please try and keep PRs as atomic as possible to aid the reviewing process - ideally, one module addition/update per PR.
Commit and push these changes to your local clone on GitHub, and then create a pull request on the nf-core/modules
GitHub repo with the appropriate information.
When you are happy with your pull request, please select the Ready for Review
label on the GitHub PR tab, and providing that everything adheres to nf-core guidelines we will endeavour to approve your pull request as soon as possible. We also recommend to request reviews from the nf-core/modules-team
so a core team of volunteers can try to review your PR as fast as possible.
Once you are familiar with the module submission process, please consider joining the reviewing team by asking on the #modules
slack channel.
Talks
These may include references to an older syntax, however the general idea remains the same
PR Review Checklist
A PR review is the process of examining a new modules’ submission or the changes proposed to a module. The reviewer provides constructive feedback on those changes before they are merged into the nf-core repository.The goal of a PR review is to ensure that the code meets the coding standards of the project, is consistent and of high-quality.
While the team of maintainers is responsible for overseeing the PR review process for modules, these guidelines can assist community members in reviewing PRs and ensure that the review process is consistent and effective. The following is a collection of community suggestions to have into account during the review process.
General reviews of submissions to modules:
In general, the main purpose of the review is to ensure
- All modules adhere to the nf-core module specifications
- Ensure all checks pass, including linting, conda, singularity, and docker.
Otherwise, you can cover most of the specifications by checking for the following:
- The module is suitable for offline running, without automatic database downloads assumed.
- If running docker containers, check that Nextflow changes the
--entrypoint
to/bin/bash
and that environment variables used by certain programs (e.g., Busco, Merqury) are sourced again to use them in container settings. - Check that it adheres to nf-core coding standards (e.g. use of meta map).
- Check that the code is readable and the formatting is correct (e.g. indenting, extra spaces).
In modules/nf-core/modulename/main.nf
:
- Check that all optional parameters are in the
$args
section. - Check that the software version extraction command is optimized, if required.
- Check if the bioconda version of the tool is the latest version.
- Ensure that temporary unzipped files are removed to avoid mitigating benefits and worsening problems.
- Ensure that large outputs are compressed with the correct tool (follow guidelines for gzip vs bzip2 vs other options).
In ../tests/modules/nf-core/modulename/main.nf
and ../tests/modules/nf-core/modulename/meta.yml
:
- Check that there are tests for all outputs, including optional ones.
- Check that the
meta.yml
file has correct documentation links and patterns of files. - Run the tool help and check that important input (usually optional) has not been missed.
- Check that all outputs are captured by running nf-test or pytest (e.g. on Gitpod).
What is the meta
map?
In nf-core DSL2 pipelines, to add sample-specific information and metadata that is carried throughout the pipeline, we use a meta variable. This avoids the need to create separate channels for each new characteristic.
The meta variable can be passed down to processes as a tuple of the channel containing the actual samples, e.g. FastQ files, and the meta variable.
The meta map
is a groovy map, which is like a python dictionary, as shown below:
Thus, the information can be accessed within processes and module.conf
files with the key i.e. meta.id
The meta variable can be passed down to processes as a tuple of the channel containing the actual samples, e.g. FastQ files, and the meta variable.
This pattern doesn’t work out of the box with fromFilePairs
The difference between the two:
As you can see the difference, they are both groovy lists.
However, the filepairs just has a val
that is a string, where as the meta_map
the first value in the list, is a groovy map, which is like a python dictionary.
The only required value is meta.id
for most of the modules, however, they usually contain fields like meta.single_end
and meta.strandedness
Common patterns
The meta map
is generated with create_fastq_channel function in the input_check subworkflow of most nf-core pipelines. Where the meta information is easily extracted from a samplesheet that contains the input file paths.
Generating a meta map
from file pairs
Sometimes you want to use nf-core modules in small scripts. You don’t want to make a samplesheet, or maintain a bunch of validation. For instance, here’s an example script to run fastqc
Sorting samples by groups
Combining channel on meta subset
Sometimes it is necessary to combine multiple channels based on a subset of the meta maps.
Unfortunately this is not yet supported as the argument by
isn’t a closure in .combine()
and .join()
and it probably won’t (Nextflow issue #3175).
To bypass this restriction one of the solution is to create a new map with only the necessary keys and make the junction on it. Here is an example:
Modify the meta map
There is multiple ways to modify the meta map. Here are some examples:
Conclusion
As you can see the meta map
is a quite flexible way for storing meta data in channels. Feel free to add whatever other key-value pairs your pipeline may need to it. We’re looking to add Custom objects which will lock down the usage a bit more.
What are the ext properties/keys?
Ext properties or keys are special process directives (See: ext directive ) that insert strings into the module scripts. For example, an nf-core module uses the string assigned to ext.args
( or ext.args2
, ext.args3
, … ) to insert tool specific options in a module script:
Example:
The configuration
inserts the string -T -K
as options to the module script:
so the script becomes:
The following table lists the available keys commonly used in nf-core modules.
Key | Description |
---|---|
ext.args | Additional arguments appended to command in module. |
ext.args2 | Second set of arguments appended to command in module. |
ext.args3 | Third set of arguments appended to command in module. |
ext.prefix | File name prefix for output files. |
The order of the numeric ID of args
must match the order of the tools as used in the module.
To see some more advanced examples of these keys in use see:
- Set ext.args based on parameter settings
- Set ext.prefix based on task inputs
- Set ext.args based on both parameters and task inputs
Advanced pattern
Multimaping
It is possible with multiMap
to split a channel in to and to call them separately afterwards.
Adding additional information to the meta map
It is possible to combine a input channel with a set of parameters as follows:
You can also combine this technique with others for more processing:
What is the Harshil Alignment
The Harshil Alignment™️ format is the whitespace-happy code style that was introduced by a certain core member to get on everyone’s nerves, but then make subsequently develop Stockholm Syndrome so that no-one in nf-core else now can look at Nextflow code without it.
The Harshil Alignment™️ format involves ensuring that common punctuation across multiple lines in a group are placed in the same location as each other.
There are many places where the format can be applied - it’s not just code, it can also applies to comment formatting - however common examples are as follows:
Curly Bracket Example
❌ Bad
âś… Good
Equals Example
❌ Bad
âś… Good
Comma Example
❌ Bad
âś… Good
Colon Example (Comments)
âś… Good
Deprecating a module
Sometimes modules or subworkflows become outdated and need to be deprecated (available, but no longer recommended).
These modules or subworkflows should not be deleted as they could be used in private repositories, or used on other
platforms. The recommended procedure is, once the alternative is available on nf-core modules, add a message to the
top of the module code saying this module is deprecated, and an assert
in the code body to print a deprecation
message like so:
The purpose of the assert
is to introduce a mechanism which stops the pipeline and alerts the developer when
an automatic update of the module/subworkflow is performed.
Help
For further information or help, don’t hesitate to get in touch on Slack #modules
channel (you can join with this invite).