Welcome to the nf-core section of the Nextflow & nf-core training course!
You can follow the commands and examples covered within this training in the document below.
Using GitPod
As with the rest of the Nextflow training up until this point, we will be using GitPod for this training.
We will use a different GitPod environment in order to get the very latest releases of the nf-core tools.
To launch GitPod, follow the link:
To work with a clean directory, you can do the following:
In the terminal, make a new directory to work in:
In the menu top left, select File > Open Folder (^⇧O or ^O)
Enter /home/gitpod/training
GitPod will probably reload the browser tab
The file explorer on the left should now have an expandable box with the title TRAINING
💡 ♻️ As you create new files you should see the explorer populate with them. If you don’t see a file you expect, try hovering over the TRAINING section title and clicking the ⟳ icon that appears to refresh the file list.
Local installation
If you prefer, you can install the nf-core/tools package locally.
It is a Python package, available via the Python Package Index (pip) and bioconda (conda / mamba).
There is a docker container available, however for the purposes of this tutorial it not recommended to use this.
To install from PyPI:
To get the very latest development version of the code:
If using conda, first set up Bioconda as described in the bioconda docs (especially setting the channel order) and then install nf-core:
First run of nf-core
Whether running using GitPod or locally, you can confirm that nf-core is correctly installed by viewing the command line help:
You should get something that looks like the following output:
The first set of subcommands are typically useful for running pipelines, the second are for developing pipelines.
You can try out some commands, for example listing available nf-core pipelines:
In this tutorial we will focus on creating a pipeline, but please do look at the functionality that nf-core/tools provides to you as a user - especially nf-core launch and nf-core download.
Creating a pipeline
To get started with your new pipeline, run the create command:
Although you can provide options on the command line, it’s easiest to use the interactive prompts.
Follow the instructions and you should see a new pipeline appear in your file explorer.
Let’s move into the new pipeline directory in the terminal:
Pipeline git repo
The nf-core create command has made a fully fledged pipeline for you.
Before getting too carried away looking at all of the files, note that it has also initiated a git repository:
It’s actually created three branches for you:
Each have the same initial commit, with the vanilla template:
This is important, because this shared git history with unmodified nf-core template in the TEMPLATE branch is how the nf-core automated template synchronisation works (see the docs for more details).
The main thing to remember with this is that:
When creating a new repository on https://github.com or equivalent, don’t initialise it - leave it bare and push everything from your local clone
Develop your code on either the master or dev branches and leave TEMPLATE alone.
Testing the new pipeline
The new pipeline should run with Nextflow, right out of the box.
Let’s try:
Sure enough, our new pipeline has run FastQC and MultiQC on a selection of test data.
Customising the template
In many of the files generated by the nf-core template, you’ll find code comments that look like this:
These are markers to help you get started with customising the template code as you write your pipeline.
Editor tools such as Todo tree help you easily navigate these and work your way through them.
Linting your pipeline
Customising the template is part of writing your new pipeline.
However, not all files should be edited - indeed, nf-core strives to promote standardisation amongst pipelines.
To try to keep pipelines up to date and using the same code where possible, we have an automated code linting tool for nf-core pipelines.
Running nf-core lint will run a comprehensive test suite against your pipeline:
Linting tests can have one of four statuses: pass, ignore, warn or fail.
For example, at first you will see a large number of warnings about TODO comments, letting you know that you haven’t finished setting up your new pipeline:
Warnings are ok at this stage, but should be cleared up before a pipeline release.
Failures are more serious however, and will typically prevent pull-requests from being merged.
For example, if you edit CODE_OF_CONDUCT.md, which should match the template, you’ll get a pipeline lint test failure:
Continuous integration testing
Whilst it’s helpful to be able to run the nf-core lint tests locally, their real strength is the combination with CI (continuous integration) testing.
By default, nf-core pipelines are configured to run CI tests on GitHub every time you push to the repo or open a pull request.
The same nf-core lint command runs on your code on the automated machines.
If there are any failures, they will be reported with a ❌ and you will not be able to merge the pull-request until you push more commits that fix those failures.
These automated tests allow us to maintain code quality in a scalable and standardised way across the community.
Configuring linting
In some special cases (especially using the nf-core template outside of the nf-core community) you may find that you want to ignore certain linting failures.
To do so, edit .nf-core.yml in the root of the repository.
For example, to ignore the tests triggering warnings in the example above, you can add:
Please see the linting documentation for specific details of how to configure different tests.
Nextflow Schema
All nf-core pipelines can be run with --help to see usage instructions.
We can try this with the demo pipeline that we just created:
Here we get a rich help output, with command line parameter variable types, and help text.
However, the Nextflow syntax in nextflow.config does not allow for this kind of rich metadata natively.
To provide this, we created a standard for describing pipeline parameters in a file in the pipeline root called nextflow_schema.json.
These files are written using JSON Schema, making them compatible with many other tools and validation libraries.
By describing our workflow parameters in this file we can do a lot of new things:
Generate command line help text and web based documentation pages
Validate pipeline inputs
Create rich pipeline launch interfaces
Indeed, if you try to run the new pipeline without the required --outdir parameter, you will quickly get an error:
Working with Nextflow schema
If you peek inside the nextflow_schema.json file you will see that it is quite an intimidating thing.
The file is large and complex, and very easy to break if edited manually.
Thankfully, we provide a user-friendly tool for editing this file: nf-core schema build.
To see this in action, let’s add some new parameters to nextflow.config:
Then run nf-core schema build:
The CLI tool should then prompt you to add each new parameter:
Select y on the final prompt to launch a web browser to edit your schema graphically.
⚠️ When using GitPod, a slightly odd thing can happen at this point.
GitPod may launch the schema editor with a terminal-based web browser called lynx.
You’ll see the terminal output suddenly fill with a text-based representation of the nf-core website.
Press Q followed by Y to confirm to exit lyx.
You should still see nf-core/tools running on the command line.
Copy the URL that was printed to the terminal and open this in a new browser tab (or alt + click it).
Here in the schema editor you can edit:
Description and help text
Type (string / boolean / integer etc)
Grouping of parameters
Whether a parameter is required, or hidden from help by default
Enumerated values (choose from a list)
Min / max values for numeric types
Regular expressions for validation
Special formats for strings, such as file-path
Additional fields for files such as mime-type
nf-core modules
Using the nf-core-demo pipeline that we created above, let’s see how we can add our own modules to the pipeline.
Add a process to the main workflow
Let’s add a simple Nextflow process to our pipeline:
Paste the process in workflows/demo.nf:
Write some code to invoke the process above:
and call it in the main workflow definition after FASTQC:
Let’s re-run the pipeline and test it. The new process will just print to screen the file names for the FastQ files that are used for the test dataset with this workflow. Add -resume to the end of the command to ensure that previously run and successful tasks are cached.
Add a local module to the pipeline
Create a new file called echo_reads.nf in modules/local/. Cut and paste the process definition into this file and save it.
Now import it in workflows/demo.nf by pasting the snippet below with all of the other include statements in that file:
Let’s re-run the pipeline and test it again to make sure it is working:
You can now include the same process as many times as you like in the pipeline which is one of the primary strengths of the Nextflow DSL2 syntax:
Try to change the workflow to call the processes above which should print the FastQ file names twice to screen instead of once.
List modules in pipeline
The nf-core pipeline template comes with a few nf-core/modules pre-installed. You can list these with the command below:
These version hashes and repository information for the source of the modules are tracked in the modules.json file in the root of the repo. This file will automatically be updated by nf-core/tools when you create, remove, update modules.
Update modules in pipeline
Let’s see if all of our modules are up-to-date:
Nothing to see here!
List remote modules on nf-core/modules
You can list all of the modules available on nf-core/modules via the command below but we have added search functionality to the nf-core website to do this too!
Install a module from nf-core/modules
Let’s install the FASTP module into the pipeline and fetch it’s key information including input and output channel definitions:
If we inspect the main script for the FASTP module the first input channel looks exactly the same as for the FASTQC module which we already know is working from the tests. We can copy the include statement printed whilst installing the pipeline and paste it in workflows/demo.nf.
We now just need to call the FASTP process in the main workflow. Paste the snippet below just after the call to ECHO_READS_TWICE.
Let’s re-run the pipeline and test it again to make sure it is working:
Patch a module
Say we want to make a slight change to an existing nf-core/module that is custom to a particular use case, we can create a patch of the module that will be tracked by the commands in nf-core/tools.
Let’s add the snippet below at the top of the script section in the FASTP nf-core module:
The linting for this module will now fail because the local copy of the module doesn’t match the latest version in nf-core/modules:
Fear not! We can just patch the module!
The diff is stored in a file:
and the path to this patch file is added to modules.json:
Lint all modules
As well as the pipeline template you can lint individual or all modules with a single command:
Create a local module
Open ./modules/local/demo/module.nf and start customising this to your needs whilst working your way through the extensive TODO comments!
Contribute to nf-core/modules
You will see that when you create a module in a clone of the modules repository more files are added than if you create a local module in the pipeline as we did in the previous section. See the DSL2 modules docs on the nf-core website for further information.