Introduction
Each nf-core module includes a meta.yml
file that describes its structure. However, the initial design of this file didn’t reflect the actual organization of the module’s input and output channels. Instead of grouping elements by channel, they were all listed at the top level. This made it difficult to understand the channel structure and led to problems such as missing descriptions for multiple meta maps.
process BWA_MEM {
...
input:
tuple val(meta) , path(reads)
tuple val(meta2), path(index)
tuple val(meta3), path(fasta)
val sort_bam
output:
tuple val(meta), path("*.bam"), emit: bam, optional: true
tuple val(meta), path("*.cram"), emit: cram, optional: true
tuple val(meta), path("*.csi"), emit: csi, optional: true
tuple val(meta), path("*.crai"), emit: crai, optional: true
path "versions.yml", emit: versions
...
}
name: bwa_mem
...
input:
- meta:
type: map
description: Groovy Map containing sample information
- reads:
type: file
description: List of input FastQ files.
- index:
type: file
description: BWA genome index files
pattern: "*.{amb,ann,bwt,pac,sa}"
- fasta:
type: file
description: Reference genome in FASTA format
pattern: "*.{fasta,fa}"
- sort_bam:
type: boolean
description: use samtools/sort (true) or samtools/view (false)
output:
- meta:
type: file
description: Output BAM file containing read alignments
pattern: "*.{bam}"
- bam:
type: file
description: Output BAM file containing read alignments
pattern: "*.{bam}"
- cram:
type: file
description: Output CRAM file containing read alignments
pattern: "*.{cram}"
- csi:
type: file
description: Optional index file for BAM file
pattern: "*.{csi}"
- crai:
type: file
description: Optional index file for CRAM file
pattern: "*.{crai}"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
...
(All files shown in this post are a simplified version of main.nf
and meta.yml
files, to show the structure of input and output channels.)
In October 2024 we updated all nf-core modules (Github Pull Request) to ensure they properly define their input and output channel structures.
process BWA_MEM {
...
input:
tuple val(meta) , path(reads)
tuple val(meta2), path(index)
tuple val(meta3), path(fasta)
val sort_bam
output:
tuple val(meta), path("*.bam"), emit: bam, optional: true
tuple val(meta), path("*.cram"), emit: cram, optional: true
tuple val(meta), path("*.csi"), emit: csi, optional: true
tuple val(meta), path("*.crai"), emit: crai, optional: true
path "versions.yml", emit: versions
...
}
name: bwa_mem
...
input:
- - meta:
type: map
description: Groovy Map containing sample information
- reads:
type: file
description: |
List of input FastQ files of size 1 and 2 for single-end and paired-end data,
respectively.
- - meta2:
type: map
description: Groovy Map containing reference information.
- index:
type: file
description: BWA genome index files
pattern: "*.{amb,ann,bwt,pac,sa}"
- - meta3:
type: map
description: Groovy Map containing sample information
- fasta:
type: file
description: Reference genome in FASTA format
pattern: "*.{fasta,fa}"
- - sort_bam:
type: boolean
description: use samtools sort (true) or samtools view (false)
pattern: "true or false"
output:
- bam:
- meta:
type: file
description: Groovy Map containing sample information
- "*.bam":
type: file
description: Output BAM file containing read alignments
pattern: "*.{bam}"
- cram:
- meta:
type: file
description: Groovy Map containing sample information
- "*.cram":
type: file
description: Output CRAM file containing read alignments
pattern: "*.{cram}"
- csi:
- meta:
type: file
description: Groovy Map containing sample information
- "*.csi":
type: file
description: Optional index file for BAM file
pattern: "*.{csi}"
- crai:
- meta:
type: file
description: Groovy Map containing sample information
- "*.crai":
type: file
description: Optional index file for CRAM file
pattern: "*.{crai}"
- versions:
- versions.yml:
type: file
description: File containing software versions
pattern: "versions.yml"
...
We also introduced linting checks to nf-core/tools to ensure the proper structure of meta.yml
files.
Together with these linting checks, we also introduced a new flag to the nf-core modules lint
command: --fix
.
This flag will try to fix all the possible lint failures related to the meta.yml
file.
Introducing ontologies
Together with these changes, we also added the bio.tools
identifier of the tool. bio.tools
is a community-driven registry of bioinformatics software and data resources. It provides information about software tools, databases, analysis workflows, and services that are used in bioinformatics and the life sciences.
name: bwa_mem
...
tools:
- bwa:
description: |
BWA is a software package for mapping DNA
sequences against a large reference genome, such as the human genome. homepage: http://bio-bwa.sourceforge.net/
documentation: https://bio-bwa.sourceforge.net/bwa.shtml
arxiv: arXiv:1303.3997
licence: ["GPL-3.0-or-later"]
identifier: "biotools:bwa"
This bio.tools ID opened new possibilities, such as being able to know the inputs and outputs of the tool and the ontology terms for input and output files.
We now use this information to generate a suggestion of input and output channels when creating a module with nf-core modules create
(available as of February 2025 on the dev
version of nf-core/tools and it will be released as part of nf-core/tools 3.3.0)
So modifying the freshly created template of a module will be easier.
Here we show an example of the template generated when creating a module for the BWA tool, using the same example as before.
process BWA_MEM {
tag "$meta.id"
label 'process_single'
// TODO nf-core: See section in main README for further information regarding finding and adding container addresses to the section below.
conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bwa:0.7.18--h577a1d6_2':
'biocontainers/bwa:0.7.18--h577a1d6_2' }"
input:
// TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
tuple val(meta), path(sequence)
tuple val(meta2), path(genome_index)
output:
// TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
tuple val(meta), path("*.{}"), emit: genome_index
tuple val(meta), path("*.{}"), emit: alignment
tuple val(meta), path("*.{}"), emit: sequence_coordinates
tuple val(meta), path("*.{}"), emit: sequence_alignment
path "versions.yml" , emit: versions
...
}
And we populate a section ontologies
of files described in the meta.yml
.
name: "bwa_mem"
...
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1' ]`
- sequence:
# TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
type: file
description: sequence file
pattern: "*.{fastq}"
ontologies:
- edam: "http://edamontology.org/data_2044" # Sequence
- edam: "http://edamontology.org/format_1930" # FASTQ
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1' ]`
- genome_index:
# TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
type: file
description: genome_index file
pattern: "*.{}"
ontologies:
- edam: "http://edamontology.org/data_3210" # Genome index
Ontologies are specified under the ontologies key, which contains a list of dictionaries. Each dictionary represents a single ontology URL, with the key indicating the ontology type (currently EDAM ontology) and the value being the URL itself. This structure allows for easy adoption of other ontologies in the future.
Using nf-core helper tools
As mentioned before, we updated the nf-core/tools package to make creating and updating modules easier.
nf-core modules create
: Will automatically fetch abio.tools
ID when possible, and use the information provided by this to populate intput and output channels in themain.nf
andmeta.yml
.nf-core modules lint
: Will make sure that the channels defined in themain.nf
are properly described in themeta.yml
.nf-core modules lint --fix
: Will try to correct the inputs and outputs onmeta.yml
to match themain.nf
file. It will add ontology URLs if missing, guessing them from thepattern
value.
Future potential
These additions open the door to new possibilities. For example knowing the exact inputs and outputs of modules, subworkflows and workflows, makes automated chaining of these components easier.