Material originally written for the Nextflow Camp 2019, Barcelona 2019-09-19: “Getting started with nf-core” (see programme).

Duration: 1hr

Updated for the nf-core Hackathon 2020, London 2020-03 (see event).

Updated for the Elixir workshop on November 2021 (see event).

Updated during the March 2022 hackathon.

Click here to download the slides associated with this tutorial.

Overview

Abstract

The nf-core community provides a range of tools to help new users get to grips with Nextflow - both by providing complete pipelines that can be used out of the box, and also by helping developers with best practices. Companion tools can create a bare-bones pipeline from a template scattered with TODO pointers and CI with linting tools check code quality. Guidelines and documentation help to get Nextflow newbies on their feet in no time. Best of all, the nf-core community is always on hand to help.

In this tutorial, we discuss the best-practice guidelines developed by the nf-core community, why they’re important and give insight into the best tips and tricks for budding Nextflow pipeline users. ✨

Introduction

What is nf-core

nf-core is a community-led project to develop a set of best-practice pipelines built using Nextflow. Pipelines are governed by a set of guidelines, enforced by community code reviews and automatic linting (code testing). A suite of helper tools aim to help people run and develop pipelines.

What this tutorial will cover

This tutorial attempts to give an overview of how nf-core works:

  • What are the most commonly used nf-core tools.
  • Listing pipelines available in the nf-core project.
  • How to run nf-core pipelines.
  • How to troubleshoot nf-core pipelines.

Where to get help

The beauty of nf-core is that there is lots of help on offer! The main place for this is Slack - an instant messaging service.

The nf-core Slack can be found at https://nfcore.slack.com (NB: no hyphen in nfcore!). To join you will need an invite, which you can get at https://nf-co.re/join/slack.

The nf-core Slack organisation has channels dedicated for each pipeline, as well as specific topics (eg. #help, #pipelines, #tools, #configs and much more).

One additional tool which we like a lot is TLDR - it gives concise command line reference through example commands for most linux tools, including nextflow, docker, singularity, conda, git and more. There are many clients, but raylee/tldr is arguably the simplest - just a single bash script.

Installing the nf-core helper tools

Much of this tutorial will make use of the nf-core command line tool. This has been developed to provide a range of additional functionality for the project such as pipeline creation, testing and more.

The nf-core tool is written in Python and is available from the Python Package Index and Bioconda. You can install the latest released version from PyPI as follows:

pip install nf-core

Or this command to install the dev version:

pip install --upgrade --force-reinstall git+https://github.com/nf-core/tools.git@dev

If using conda, first set up Bioconda as described in the bioconda docs (especially setting the channel order), create and activate an environment and then install nf-core:

conda install nf-core

To update the package you can run the following command

conda update nf-core

The nf-core/tools source code is available at https://github.com/nf-core/tools - if you prefer, you can clone this repository and install the code locally:

git clone https://github.com/nf-core/tools.git nf-core-tools
cd nf-core-tools
python setup.py install

Once installed, you can check that everything is working by printing the help:

nf-core --help

You will also need to install Prettier for formatting your code. To do so, you can either use the following command with conda:

conda install prettier

Or use the Visual Studio Code extension Prettier also available in the pack of useful extension NF-core.

Besides, you can also add a comment with @nf-core-bot fix linting in your Pull Request and prettier will be used to apply the required fixes to your code.

Exercise 1 (installation)

  • Install nf-core/tools
  • Use the help flag to list the available commands

Listing available nf-core pipelines

As you saw from the --help output, the tool has a range of sub-commands. The simplest is nf-core list, which lists all available nf-core pipelines. The output shows the latest version number, when that was released. If the pipeline has been pulled locally using Nextflow, it tells you when that was and whether you have the latest version.

If you supply additional keywords after the command, the listed pipelines will be filtered. Note that this searches more than just the displayed output, including keywords and description text. The --sort flag allows you to sort the list (default is by most recently released) and --json returns the complete list, without any filtering, in JSON output for programmatic use.

The nf-core pipelines currently available and under development are also listed on the nf-core website, in the pipelines page.

Exercise 2 (listing pipelines)

  • Use the help flag to print the list command usage
  • List all available nf-core pipelines
  • Sort pipelines alphabetically, then by popularity (stars)
  • Fetch one of the pipelines using nextflow pull
  • Use nf-core list to see if the pipeline you pulled is up to date
  • Filter pipelines for those that work with RNA
  • Save these pipeline details to a JSON file

Running nf-core pipelines

Software requirements for nf-core pipelines

In order to run nf-core pipelines, you will need to have Nextflow installed (https://www.nextflow.io). The only other requirement is a software packaging tool: Conda, Docker or Singularity. In theory it is possible to run the pipelines with software installed by other methods (e.g. environment modules, or manual installation), but this is not recommended. Most people find either Docker or Singularity containers the best options, as conda environments cannot guarantee 100% reproducibility.

Fetching pipeline code

Unless you are actively developing pipeline code, we recommend using the Nextflow built-in functionality to fetch nf-core pipelines. Nextflow will automatically fetch the pipeline code when you run nextflow run nf-core/PIPELINE. For the best reproducibility, it is good to explicitly reference the pipeline version number that you wish to use with the -revision/-r flag. For example:

nextflow run nf-core/rnaseq -revision 3.4

If not specified, Nextflow will fetch the default branch. For pipelines with a stable release this the default branch is master - this branch contains code from the latest release. For pipelines in early development that don’t have any releases, the default branch is dev.

If you would like to run the latest development code, use -r dev.

Note that once pulled, Nextflow will use the local cached version for subsequent runs. Use the -latest flag when running the pipeline to always fetch the latest version. Alternatively, you can force Nextflow to pull a pipeline again using the nextflow pull command:

nextflow pull nf-core/rnaseq -revision 3.4

Usage instructions and documentation

You can find general documentation and instructions for Nextflow and nf-core on the nf-core website: https://nf-co.re/. Pipeline-specific documentation is bundled with each pipeline in the /docs folder. This can be read either locally, on GitHub, or on the nf-core website. Each pipeline has its own webpage at https://nf-co.re/<pipeline_name> (e.g. nf-co.re/rnaseq), including Usage documentation, Output documentation and Parameter documentation.

In addition to this documentation, each pipeline comes with basic command line reference. This can be seen by running the pipeline with the --help flag, for example:

nextflow run nf-core/rnaseq --help

Example results of a pipeline run on full-sized test data can be browsed on the pipeline page, under the aws results tab.

Config profiles

Nextflow can load pipeline configurations from multiple locations. To make it easy to apply a group of options on the command line, Nextflow uses the concept of config profiles. nf-core pipelines load configuration in the following order:

  1. Pipeline: Default ‘base’ config
    • Always loaded. Contains pipeline-specific parameters and “sensible defaults” for things like computational requirements
    • Does not specify any method for software packaging. If nothing else is specified, Nextflow will expect all software to be available on the command line.
  2. Pipeline: Core config profiles
    • All nf-core pipelines come with some generic config profiles. The most commonly used ones are for software packaging: docker, singularity and conda. To ensure reproducibility across different compute infrastructures, it is recommended to use containers instead of conda environments.
    • Other core profiles are debug and test
  3. nf-core/configs: Server profiles
    • At run time, nf-core pipelines fetch configuration profiles from the configs remote repository. The profiles here are specific to clusters at different institutions.
    • Because this is loaded at run time, anyone can add a profile here for their system and it will be immediately available for all nf-core pipelines.
  4. Personal configuration under ~/.nextflow/config.
  5. Local config files given to Nextflow with the -c flag.
  6. Command line configuration.

Multiple comma-separate config profiles can be specified in one go, so the following commands are perfectly valid:

nextflow run nf-core/rnaseq -profile test,docker
nextflow run nf-core/rnaseq -profile singularity,debug

Note that the order in which config profiles are specified matters. Their priority increases from left to right.

Our tip: Be clever with multiple Nextflow configuration locations. For example, use -profile for your cluster configuration, ~/.nextflow/config for your personal config such as params.email and a working directory config (e.g. custom.config provided to the run with -c custom.config) file for reproducible run-specific configuration.

To know more about Nextflow configurations you can check the pipeline configuration tutorial.

Running pipelines with test data

The test config profile is a bit of a special case. Whereas all other config profiles tell Nextflow how to run on different computational systems, the test profile configures each nf-core pipeline to run without any other command line flags. It specifies URLs for test data and all required parameters. Because of this, you can test any nf-core pipeline with the following command:

nextflow run nf-core/<pipeline_name> -profile test --outdir <OUTDIR>

Note that you will typically still need to combine this with a configuration profile for your system - e.g. -profile test,docker. Running with the test profile is a great way to confirm that you have Nextflow configured properly for your system before attempting to run with real data.

The nf-core launch command

Most nf-core pipelines have a number of flags that need to be passed on the command line: some mandatory, some optional. To make it easier to launch pipelines, these parameters are described in a JSON file bundled with the pipeline. The nf-core launch command uses this to build an interactive command-line wizard which walks through the different options with descriptions of each, showing the default value and prompting for values.

Once all prompts have been answered, non-default values are saved to a params.json file which can be supplied to Nextflow to run the pipeline. Optionally, the Nextflow command can be launched there and then.

To use the launch feature, just specify the pipeline name:

nf-core launch <pipeline_name>

Using nf-core pipelines offline

Many of the techniques and resources described above require an active internet connection at run time - pipeline files, configuration profiles and software containers are all dynamically fetched when the pipeline is launched. This can be a problem for people using secure computing resources that do not have connections to the internet.

To help with this, the nf-core download command automates the fetching of required files for running nf-core pipelines offline. The command can download a specific release of a pipeline with -r/--release and fetch the singularity container if --singularity is passed (this needs Singularity to be installed). All files are saved to a single directory, ready to be transferred to the cluster where the pipeline will be executed.

To know more about running pipelines offline you can check the pipeline configuration tutorial.

Exercise 3 (using pipelines)

  • Install required dependencies (Nextflow, Docker)
  • Print the command-line usage instructions for the nf-core/rnaseq pipeline
  • In a new directory, run the nf-core/rnaseq pipeline with the provided test data
  • Try launching the RNA pipeline using the nf-core launch command
  • Download the nf-core/rnaseq pipeline for offline use using the nf-core download command

Troubleshooting nf-core pipelines

Not everything always runs smoothly and you might be getting some errors when running nf-core pipelines. Here are some step-by-step tips that can help you troubleshoot your errors.

  1. Start small: each nf-core pipeline comes with small test data that are checked by continuous integration and for each pipeline release.
    • Start by running the pipeline tests as described above. If these tests fail, there is a good chance that you are missing some of the components needed to run Nextflow pipelines.
    • Nextflow: check that you have the latest version installed.
    • Check that you have docker/singularity/conda installed and that you are using the right docker/singularity/conda/custom profile.
    • Check the troubleshooting docs.
  2. Categorize the type of error. Check the Nextflow low to figure out if the error occurs:
    • Before the first process
    • In the first process
    • During the pipeline run
    • Problems with the process output
  3. Read the Nextflow log. Check the work directory for the .command.err or .command.log files for more information.
  4. Search the nf-core slack, google. Ask for help in the corresponding nf-core slack channel.
  5. Report a pipeline bug on the nf-core GitHub if none of the above steps helps.

Here is a bytesize talk with a step by step explanation on how to troubleshoot failing pipelines.

Conclusion

We hope that this nf-core tutorial has been helpful! Remember that there is more in-depth documentation on many of these topics available on the nf-core website. If in doubt, please ask for help on Slack.

If you have any suggestions for how to improve this tutorial, or spot any mistakes, please create an issue or pull request on the nf-core/website repository.

Phil Ewels, Maxime Garcia, Gisela Gabernet, Friederike Hanssen for nf-core, March 2022