meta map
In nf-core DSL2 pipelines, to add sample-specific information and metadata that is carried throughout the pipeline, we use a meta variable. This avoids the need to create separate channels for each new characteristic.
The meta variable can be passed down to processes as a tuple of the channel containing the actual samples, e.g. FastQ files, and the meta variable.
The meta map
is a groovy map, which is like a python dictionary, as shown below:
Thus, the information can be accessed within processes and module.conf
files with the key i.e. meta.id
The meta variable can be passed down to processes as a tuple of the channel containing the actual samples, e.g. FastQ files, and the meta variable.
This pattern doesn’t work out of the box with fromFilePairs
The difference between the two:
As you can see the difference, they are both groovy lists.
However, the filepairs just has a val
that is a string, where as the meta_map
the first value in the list, is a groovy map, which is like a python dictionary.
The only required value is meta.id
for most of the modules, however, they usually contain fields like meta.single_end
and meta.strandedness
Common patterns
The meta map
is generated with create_fastq_channel function in the input_check subworkflow of most nf-core pipelines. Where the meta information is easily extracted from a samplesheet that contains the input file paths.
Generating a meta map
from file pairs
Sometimes you want to use nf-core modules in small scripts. You don’t want to make a samplesheet, or maintain a bunch of validation. For instance, here’s an example script to run fastqc
Sorting samples by groups
Combining channel on meta subset
Sometimes it is necessary to combine multiple channels based on a subset of the meta maps.
Unfortunately this is not yet supported as the argument by
isn’t a closure in .combine()
and .join()
and it probably won’t (Nextflow issue #3175).
To bypass this restriction one of the solution is to create a new map with only the necessary keys and make the junction on it. Here is an example:
Modify the meta map
There is multiple ways to modify the meta map. Here are some examples:
Conclusion
As you can see the meta map
is a quite flexible way for storing meta data in channels. Feel free to add whatever other key-value pairs your pipeline may need to it. We’re looking to add Custom objects which will lock down the usage a bit more.
Advanced pattern
Multimaping
It is possible with multiMap
to split a channel in to and to call them separately afterwards.
Adding additional information to the meta map
It is possible to combine a input channel with a set of parameters as follows:
You can also combine this technique with others for more processing: