Pre-processing of mass spectrometry-based metabolomics data with quantification and identification based on MS1 and MS2 data.
nf-core/metaboigniter is a bioinformatics pipeline that ingests raw mass spectrometry data in mzML format, typically in the form of peak lists and MS2 spectral data, for comprehensive metabolomics analysis. The key stages involve centroiding, feature detection, adduct detection, alignment, and linking, which progressively refine and align the data. The pipeline can also perform requantification to compensate for missing values and leverages MS2Query for compound identification based on MS2 data, outputting a comprehensive list of detected and potentially identified metabolites.
- Centroiding: Converts the continuous mass spectra into a series of discrete points.
- Feature Detection: Identifies unique signals or ‘features’ in the spectra.
- Adduct Detection: Identifies adduct ions, which are formed by the interaction of the sample with the ion source.
- Alignment: Ensures that the same features across different samples are matched together.
- Linking: Establishes connections between features across different ionization modes or adducts.
- Requantification: Fills in missing values in the data set for a more complete analysis.
- Identification: Uses MS2Query and SIRIUS to identify compounds based on their MS2 spectral data.
- Output Generation: Produces a comprehensive list of detected and potentially identified metabolites.
First, prepare a samplesheet with your input data that looks as follows:
Each row in this CSV file represents a unique sample, with the details provided in the columns.
- sample: This column should contain unique names for each sample. No two samples should share the same name in this column.
- level: This column should specify the level of mass spectrometry data contained in each sample file. This can be ‘MS1’ for files containing only MS1 data, ‘MS2’ for files containing only MS2 data, and ‘MS12’ for files containing both MS1 and MS2 data.
- type: This column can contain any descriptor of your choice, such as ‘normal’, ‘disease’, etc. This is usually used to provide some classification or group identification to your samples.
- msfile: This column should contain the path to the mzML file for each sample.
Now, you can run the pipeline using:
Please provide pipeline parameters via the CLI or Nextflow
-params-file option. Custom config files including those provided by the
-c Nextflow option can be used to provide any configuration except for parameters;
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
nf-core/metaboigniter was originally written by Payam Emami. The DSL2 version was developed with significant contributions from Axel Walter and Efi Kontou.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
If you use nf-core/metaboigniter for your analysis, please cite it using the following doi: 10.5281/zenodo.4743790
An extensive list of references for the tools used by the pipeline can be found in the
You can cite the
nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.