Material originally written for the Nextflow Camp 2019, Barcelona 2019-09-19: "Getting started with nf-core" (see programme).
Duration: 1hr 45
Last updated: September 2019
- Installing the nf-core helper tools
- Listing available nf-core pipelines
- Running nf-core pipelines
- Creating nf-core pipelines
- Testing nf-core pipelines
- Releasing nf-core pipelines
- 1 - installation
- 2 - listing pipelines
- 3 - using pipelines
- 4 - creating pipelines
- 5 - testing pipelines
- 6 - releasing pipelines
The nf-core community provides a range of tools to help new users get to grips with nextflow - both by providing complete pipelines that can be used out of the box, and also by helping developers with best practices. Companion tools can create a bare-bones pipeline from a template scattered with
TODO pointers and CI with linting tools check code quality. Guidelines and documentation help to get nextflow newbies on their feet in no time. Best of all, the nf-core community is always on hand to help.
In this tutorial we discuss the best-practice guidelines developed by the nf-core community, why they're important and give insight into the best tips and tricks for budding nextflow pipeline developers. ✨
What is nf-core
nf-core is a community-led project to develop a set of best-practice pipelines built using Nextflow. Pipelines are governed by a set of guidelines, enforced by community code reviews and automatic linting (code testing). A suite of helper tools aim to help people run and develop pipelines.
This tutorial attempts to give an overview of how nf-core works: how to run nf-core pipelines, how to make new pipelines using the nf-core template and how nf-core pipelines are reviewed and ultimately released.
The beauty of nf-core is that there is lots of help on offer! The main place for this is Slack - an instant messaging service. The nf-core Slack organisation has channels dedicated for each pipeline, as well as specific topics (eg.
One additional tool which this author swears by is TLDR - it gives concise command line reference through example commands for most linux tools, including
git and more. There are many clients, but raylee/tldr is arguably the simplest - just a single bash script.
Much of this tutorial will make use of the
nf-core command line tool. This has been developed to provide a range of additional functionality for the project such as pipeline creation, testing and more.
pip install nf-core
If using conda, first set up for bioconda as described in the bioconda docs and then install nf-core:
conda install nf-core
The nf-core/tools source code is available at https://github.com/nf-core/tools - if you prefer, you can clone this repository and install the code locally:
git clone https://github.com/nf-core/tools.git nf-core-tools cd nf-core-tools python setup.py install
Once installed, you can check that everything is working by printing the help:
- Install nf-core/tools
- Use the help flag to list the available commands
As you saw from the
--help output, the tool has a range of subcommands. The simplest is
nf-core list, which lists all available nf-core pipelines. The output shows the latest version number, when that was released. If the pipeline has been pulled locally using Nextflow, it tells you when that was and whether you have the latest version.
If you supply additional keywords after the command, the listed pipeline will be filtered. Note that this searches more than just the displayed output, including keywords and description text. The
--sort flag allows you to sort the list (default is by most recently released) and
--json gives JSON output for programmatic use.
- Use the help flag to print the list command usage
- List all pipelines
- Sort pipelines alphabetically, then by popularity (stars)
- Fetch one of the pipelines using
nf-core listto see if the pipeline you pulled is up to date
- Filter pipelines for those that work with RNA
- Save these pipeline details to a JSON file
In order to run nf-core pipelines, you will need to have Nextflow installed (https://www.nextflow.io). The only other requirement is a software packaging tool: Conda, Docker or Singularity. In theory it is possible to run the pipelines with software installed by other methods (e.g. environment modules, or manual installation), but this is not recommended. Most people find either Docker or Singularity the best options.
Unless you are actively developing pipeline code, we recommend using the Nextflow built-in functionality to fetch nf-core pipelines. Nextflow will automatically fetch the pipeline code when you run
nextflow run nf-core/PIPELINE. For the best reproducibility, it is good to explicitly reference the pipeline version number that you wish to use with the
-r flag. For example:
nextflow run nf-core/rnaseq -revision 1.3
If not specified, Nextflow will fetch the
master branch - for nf-core pipelines this will be the latest release. If you would like to run the latest development code, use
Note that once pulled, Nextflow will use the local cached version for subsequent runs. Use the
-latest flag when running the pipeline to always fetch the latest version. Alternatively, you can force Nextflow to pull a pipeline again using the
nextflow pull command:
nextflow pull nf-core/rnaseq
You can find general documentation and instructions for Nextflow and nf-core on the nf-core website: https://nf-co.re/. Pipeline-specific documentation is bundled with each pipeline in the
/docs folder. This can be read either locally, on GitHub, or on the nf-core website. Each pipeline has its own webpage at
In addition to this documentation, each pipeline comes with basic command line reference. This can be seen by running the pipeline with the
--help flag, for example:
nextflow run nf-core/rnaseq --help
Nextflow can load pipeline configurations from multiple locations. To make it easy to apply a group of options on the command line, Nextflow uses the concept of config profiles. nf-core pipelines load configuration in the following order:
- Pipeline: Default 'base' config
- Always loaded. Contains pipeline-specific parameters and "sensible defaults" for things like computational requirements
- Does not specify any method for software packaging. If nothing else is specified, Nextflow will expect all software to be available on the command line.
- Pipeline: Core config profiles
- All nf-core pipelines come with some generic config profiles. The most commonly used ones are for software packaging:
- Other core profiles are
- All nf-core pipelines come with some generic config profiles. The most commonly used ones are for software packaging:
- nf-core/configs: Server profiles
- At run time, nf-core pipelines fetch configuration profiles from the configs remote repository. The profiles here are specific to clusters at different institutions.
- Because this is loaded at run time, anyone can add a profile here for their system and it will be immediately available for all nf-core pipelines.
- Local config files given to Nextflow with the
- Command line configuration
Multiple comma-separate config profiles can be specified in one go, so the following commands are perfectly valid:
nextflow run nf-core/rnaseq -profile test,docker nextflow run nf-core/hlatyping -profile singularity,debug
Note that the order in which config profiles are specified matters. Their priority increases from left to right.
test config profile is a bit of a special case. Whereas all other config profiles tell Nextflow how to run on different computational systems, the
test profile configures each
nf-core pipeline to run without any other command line flags. It specifies URLs for test data and all required parameters. Because of this, you can test any nf-core pipeline with the following command:
nextflow run nf-core/PIPELINE -profile test
Note that you will typically still need to combine this with a configuration profile for your system - e.g.
-profile test,docker. Running with the test profile is a great way to confirm that you have Nextflow configured properly for your system before attempting to run with real data.
Most nf-core pipelines have a number of flags that need to be passed on the command line: some mandatory, some optional. To make it easier to launch pipelines, these parameters are described in a JSON file bundled with the pipeline. The
nf-core launch command uses this to build an interactive command-line wizard which walks through the different options with descriptions of each, showing the default value and prompting for values.
NB: This is an experimental feature - JSON file and rich descriptions of parameters is not yet available for all pipelines.
Once all prompts have been answered, non-default values are saved to a
params.json file which can be supplied to Nextflow to run the pipeline. Optionally, the nextflow command can be launched there and then.
To use the launch feature, just specify the pipeline name:
nf-core launch <PIPELINE>
Many of the techniques and resources described above require an active internet connection at run time - pipeline files, configuration profiles and software containers are all dynamically fetched when the pipeline is launched. This can be a problem for people using secure computing resources that do not have connections to the internet.
To help with this, the
nf-core download command automates the fetching of required files for running nf-core pipelines offline. The command can download a specific release of a pipeline with
--release and fetch the singularity container if
--singularity is passed (this needs Singularity to be installed). All files are saved to a single directory, ready to be transferred to the cluster where the pipeline will be executed.
NB: At the time of writing (Sept 2019), functionality to download config files is not yet complete. This should be included soon.
- Install required dependencies (nextflow, docker)
- Print the command-line usage instructions for the nf-core/rnaseq pipeline
- In a new directory, run the nf-core/rnaseq pipeline with the provided test data
- Try launching the RNA pipeline using the
- Download the nf-core/rnaseq pipeline for offline use using the
The heart of nf-core is the standardisation of pipeline code structure. To achieve this, all pipelines adhere to a generalised pipeline template. The best way to build an nf-core pipeline is to start by using this template via the
nf-core create command. This launches an interactive prompt on the command line which asks for things such as pipeline name, a short description and the author's name. These values are then propagated throughout the template files automatically.
Not everything can be completed with a template and all new pipelines will need to edit and add to the resulting pipeline files in a similar set of locations. To make it easier to find these, the nf-core template files have numerous comment lines beginning with
TODO nf-core:, followed by a description of what should be changed or added. These comment lines can be deleted once the required change has been made.
Most code editors have tools to automatically discover such
TODO lines and the
nf-core lint command will flag these. This makes it simple to systematically work through the new pipeline, editing all files where required.
The only hard requirement for all nf-core pipelines is that software must be available in Docker images. However, it is recommended that pipelines use the following methodology where possible:
- Software requirements are defined for Conda in
- Docker images are automatically built on Docker Hub, using Conda
- Singularity images are generated from Docker Hub at run time for end users
This approach has the following merits:
- A single file contains a list of all required software, making it easy to maintain
- Identical (or as close as is possible) software is available for users using Conda, Docker or Singularity
- Having a single container image for the pipeline uses disk space efficiently for Singularity images, and is simple to manage and transfer.
The reason that the above approach is not a hard requirement is that some issues can prevent it from working, such as:
- It may not be possible to package software on conda due to software licensing limitations
- Different packages may have dependency conflicts which are impossible to resolve
Alternative approaches are then decided upon on a case-by-case basis. We encourage you to discuss this on Slack early on as we have been able to resolve some such issues in the past.
The nf-core template will create a simple
environment.yml file for you with an environment name, conda channels and one or two dependencies. You can then add additional required software to this file. Note that all software packages must have a specific version number pinned - the format is a single equals sign, e.g
Where software packages are not already available on Bioconda or Conda-forge, we encourage developers to add them. This benefits the wider community, as well as just users of the nf-core pipeline.
You can use Docker for testing by building the image locally. The pipeline expects a container with a specific name, so you must tag the Docker image with this. You can build and tag an image in a single step with the following command:
docker build -t nfcore/PIPELINE:dev .
Note that it is
nfcore without a hyphen (docker hub doesn't allow any punctuation). The
. refers to the current working directory - if run in the root pipeline folder this will tell Docker to use the
Dockerfile recipe found there.
All nf-core pipelines use GitHub as their code repository, and git as their version control system. For newcomers to this world, it is helpful to know some of the basic terminology used:
- A repository contains everything for a given project
- Commits are code checkpoints.
- A branch is a linear string of commits - multiple parallel branches can be created in a repository
- Commits from one branch can be merged into another
- Repositories can be forked from one GitHub user to another
- Branches from different forks can be merged via a Pull Request (PR) on github.com
Typically, people will start developing a new pipeline under their own personal account on GitHub. When it is ready for its first release and has been discussed on Slack, this repository is forked to the nf-core organisation. All developers then maintain their own forks of this repository, contributing new code back to the nf-core fork via pull requests.
All nf-core pipelines must have the following three branches:
master- commits from stable releases only. Should always have code from the most recent release.
dev- current development code. Merged into
TEMPLATE- used for template automation by the @nf-core-bot GitHub account. Should only contain commits with unmodified template code.
Pull requests to the nf-core fork have a number of automated steps that must pass before the PR can be merged. A few points to remember are:
- The pipeline
CHANGELOG.mdmust be updated
- PRs must not be against the
masterbranch (typically you want
- PRs should be reviewed by someone else before being merged
When you fork your pipeline repository to the nf-core organisation, one of the core team will set up Travis CI (automated testing) and Docker Hub (automated Docker image creation) for you. However, it can be helpful to set these up on your personal fork as well. That way, you can be confident that everything will work when you fork or open a PR on the nf-core organisation.
- Make a new pipeline using the template
- Update the readme file to fill in the
- Add a new process to the pipeline in
- Add the new software dependencies from this process in to
Manually checking that a pipeline adheres to all nf-core guidelines and requirements is a difficult job. Wherever possible, we automate such code checks with a code linter. This runs through a series of tests and reports failures, warnings and passed tests.
The linting code is closely tied to the nf-core template and both change over time. When we change something in the template, we often add a test to the linter to make sure that pipelines do not use the old method.
Each lint test has a number and is documented on the nf-core website. When warnings and failures are reported on the command line, a short description is printed along with a link to the documentation for that specific test on the website.
Code linting is run automatically every time you push commits to GitHub, open a pull request or make a release. You can also run these tests yourself locally with the following command:
nf-core lint /path/to/pipeline
When merging PRs from
lint command will be run with the
--release flag which includes a few additional tests.
When adding a new pipeline, you must also set up the
test config profile. To do this, we use the nf-core/test-datasets repository. Each pipeline has its own branch on this repository, meaning that the data can be cloned without having to fetch all test data for all pipelines:
git clone --single-branch --branch PIPELINE https://github.com/nf-core/test-datasets.git
To set up the test profile, make a new branch on the nf-core/test-datasets repo through the web page (see instructions). Fork the repository to your user and open a PR to your new branch with a really (really!) tiny dataset. Once merged, set up the
conf/test.config file in your pipeline to refer to the URLs for your test data.
These test datasets are used by the automated continuous integration tests. The systems that run these tests are extremely limited in the resources that they have available. Typically, the pipeline should be able to complete in around 10 minutes and use no more than 6-7 GB memory. To achieve this, input files and reference genomes need to be very tiny. If possible, a good approach can be to use PhiX or Yeast as a reference genome. Alternatively, a single small chromosome (or part of a chromosome) can be used. If you are struggling to get the tests to run, ask for help on Slack.
conf/test.config remember to define all required parameters so that the pipeline will run with only
-profile test. Note that remote URLs cannot be traversed like a regular file system - so glob file expansions such as
*.fa will not work.
The automated tests with Travis CI are configured in the
.travis.yml file that is generated by the template. The
script block defines three tests: linting the code with
nf-core lint, linting the syntax of all Markdown documentation and running the pipeline with the test data.
env section sets the
NXF_VER environment variable twice. This tells Travis to run the tests twice in parallel - once with the latest version of Nextflow (
NXF_VER='') and once with the minimum version supported by the pipeline. Do not edit this version number manually - it appears in multiple locations through the pipeline code, so it's better to use
nf-core bump-version --nextflow instead.
The provided tests may be sufficient for your pipeline. However, if it is possible to run the pipeline with significantly different options (for example, different alignment tools), then it is good to test all of these. You can do this by adding additional commands in the
nf-core linton your pipeline and make note of any test warnings / failures
- Read up on one or two of the linting rules on the nf-core website and see if you can fix some.
- Take a look at
conf/test.configand switch the test data for another dataset on nf-core/test_data.
Your pipeline is written and ready to go! Before you can release it with nf-core there are a few steps that need to be done. First, tell everyone about it on Slack in the
#new-pipelines channel. Hopefully you've already done this before you spent lots of time on your pipeline, to check that there aren't other similar efforts happening elsewhere. Next, you need to be a member of the nf-core GitHub organisation. You can find instructions for how to do this at https://nf-co.re/join.
Once you're ready to go, you can fork your repository to nf-core. A lot of stuff happens automatically when you do this: the website will update itself to include your new pipeline, complete with rendered documentation pages and usage statistics. Your pipeline will also appear in the
nf-core list command output and in various other locations.
Unfortunately, at the time of writing, Travis CI, Docker Hub and Zenodo (automated DOI assignment for releases) services are not created automatically. These can only be set up by nf-core administrators, so please ask someone to do this for you on Slack.
Once everything is set up and all tests are passing on the
dev branch, let us know on Slack and we will do a large community review. This is a one-off process that is done before the first release for all pipelines. In order to give a nice interface to review all pipeline code, we create a "pseudo pull request" comparing
dev against the first commit in the pipeline (hopefully the template creation). This PR will never be merged, but gives the GitHub review web pages where people can comment on specific lines in the code.
These first community reviews can take quite a long time and typically result in a lot of comments and suggestions (nf-core/deepvariant famously had 156 comments before it was approved). Try not to be intimidated - this is the main step where the community attempts to standardise and suggest improvements for your code. Your pipeline will come out the other side stronger than ever!
Once the pseudo-PR is approved, you're ready to make the release. To do this, first bump the pipeline version to a stable tag using
nextflow bump-version, then open a pull-request from the
dev branch to
master. Once tests are passing and two nf-core members have approved this PR, it can be merged to
master. Then a GitHub release is made, using the contents of the changelog as a description.
Pipeline version numbers (release tags) should be numerical only, using semantic versioning. For example, with a release version
1 would correspond to the major release where results would no longer be backwards compatible. Changing
4 would be a minor release, for example adding some new features. Changing
3 would be a patch release for minor things such as fixing bugs.
Over time, new versions of nf-core/tools will be released with changes to the template. In order to keep all nf-core pipelines in sync, we have developed an automated synchronisation procedure. A GitHub bot account, @nf-core-bot is scripted on a new tools release to use
nf-core create with the new template using the input values you used on your pipeline. This is committed to the
TEMPLATE branch and a pull-request created to incorporate these changes into
Note that these PRs can sometimes create git merge conflicts which will need to be resolved manually. There are plugins for most code editors to help with this process. Once resolved and checked this PR can be merged and a new pipeline release created.
nf-core bump-versionto update the required version of Nextflow in your pipeline
- Bump your pipeline's version to 1.0, ready for its first release!
- Make sure that you're signed up to the nf-core slack (get an invite on nf-co.re) and drop us a line about your latest and greatest pipeline plans!
- Ask to be a member of the nf-core GitHub organisation by commenting on this GitHub issue
- If you're a twitter user, make sure to follow the @nf_core account
I hope that this nf-core tutorial has been helpful! Remember that there is more in-depth documentation on many of these topics available on the nf-core website. If in doubt, please ask for help on Slack.
If you have any suggestions for how to improve this tutorial, or spot any mistakes, please create an issue or pull request on the nf-core/nf-co.re repository.
Phil Ewels, September 2019