nf-core/proteomicslfq
Proteomics label-free quantification (LFQ) analysis pipeline
22.10.6
.
Learn more.
Introduction
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
- (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS’ FileConverter if just an index is missing
- (optional) Decoy database generation for the provided DB (fasta) with OpenMS
- Database search with either MSGF+ and/or Comet through OpenMS adapters
- Re-mapping potentially identified peptides to the input database for consistency and error-checking (using OpenMS’ PeptideIndexer)
- PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS
- If multiple search engines were chosen, the results are combined with OpenMS’ ConsensusID
- If multiple search engines were chosen, a combined FDR is calculated
- Single run PSM/Peptide-level FDR filtering
- If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter
- Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS’ ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein (and if requested peptide/PSM-level).
A rough visualization follows:
Output structure
Output is by default written to the $NXF_WORKSPACE/results folder. The output consists of the following folders (follow the links for a more detailed description):
results
- ids
- logs (extended log files for all steps)
- *.log
- msstats
- pipeline_info (general nextflow infos)
- proteomics_lfq
- ptxqc (quality control)
Output description
Nextflow pipeline info
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.
Output files:
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.csv
. - Documentation for interpretation of results in HTML format:
results_description.html
.
- Reports generated by Nextflow:
Identifications
Intermediate output for the PSM/peptide-level filtered identifications per raw/mzML file in OpenMS’ internal idXML format.
ProteomicsLFQ main output
The proteomics_lfq
folder contains the output of the pipeline without any statistical postprocessing.
It is available in three different formats:
ConsensusXML
A consensusXML file as the closest representation of the internal data structures generated by OpenMS. Helpful for debugging and downstream processing with OpenMS tools.
MSstats-ready quantity table
A simple tsv file ready to be read by the OpenMStoMSstats function of the MSstats R package. It should hold the same quantities as the consensusXML but rearranged in a “long” table format with additional information about the experimental design used by MSstats.
mzTab
A complete mzTab file ready for submission to PRIDE.
MSstats output
The msstats
folder contains MSstats’ post-processed (e.g. imputation, outlier removal) quantities and statistical
measures of significance for different tested contrasts of the given experimental design. It also includes basic plots of these results.
The results will only be available if there was more than one condition.
MSstats mzTab
The mzTab from the proteomics_lfq folder with replaced normalized and imputed quantities from MSstats. This might contain less quantities since MSstats filters proteins with too many missing values.
MSstats table
See MSstats vignette.
MSstats plots
See MSstats vignette for groupComparisonPlots (Heatmap, VolcanoPlot and ComparisonPlot (per protein)).
PTXQC output
If activated, the ptxqc
folder will contain the report of the PTXQC R package based on the mzTab output of proteomicsLFQ.
PTXQC report
See PTXQC vignette. In the report itself the calculated and visualized QC metrics are actually quite extensively described already.
PTXQC yaml config
The default yaml config used to configure the structure of the QC report. In case you need to restructure, please edit this file and re-run PTXQC manually.