nf-core/proteomicslfq
Edit

Proteomics label-free quantification (LFQ) analysis pipeline

label-free-quantificationopenmsproteomics

This is the development version of the pipeline.

This pipeline uses DSL1. It will not work with Nextflow versions after 22.10.6.Learn more.

Launch development version https://github.com/nf-core/proteomicslfq

Introduction

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

(optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS’ FileConverter if just an index is missing
(optional) Decoy database generation for the provided DB (fasta) with OpenMS
Database search with either MSGF+ and/or Comet through OpenMS adapters
Re-mapping potentially identified peptides to the input database for consistency and error-checking (using OpenMS’ PeptideIndexer)
PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS
If multiple search engines were chosen, the results are combined with OpenMS’ ConsensusID
If multiple search engines were chosen, a combined FDR is calculated
Single run PSM/Peptide-level FDR filtering
If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter
Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS’ ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein (and if requested peptide/PSM-level).

A rough visualization follows:

proteomicslfq workflow

Output structure

Output is by default written to the $NXF_WORKSPACE/results folder. The output consists of the following folders (follow the links for a more detailed description):

results

ids
- *.idXML
logs (extended log files for all steps)
- *.log
msstats
pipeline_info (general nextflow infos)
- …
proteomics_lfq
ptxqc (quality control)

Output description

Nextflow pipeline info

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

Output files:

pipeline_info/
- Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
- Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.csv.
- Documentation for interpretation of results in HTML format: results_description.html.

Identifications

Intermediate output for the PSM/peptide-level filtered identifications per raw/mzML file in OpenMS’ internal idXML format.

ProteomicsLFQ main output

The proteomics_lfq folder contains the output of the pipeline without any statistical postprocessing. It is available in three different formats:

ConsensusXML

A consensusXML file as the closest representation of the internal data structures generated by OpenMS. Helpful for debugging and downstream processing with OpenMS tools.

MSstats-ready quantity table

A simple tsv file ready to be read by the OpenMStoMSstats function of the MSstats R package. It should hold the same quantities as the consensusXML but rearranged in a “long” table format with additional information about the experimental design used by MSstats.

mzTab

A complete mzTab file ready for submission to PRIDE.

MSstats output

The msstats folder contains MSstats’ post-processed (e.g. imputation, outlier removal) quantities and statistical measures of significance for different tested contrasts of the given experimental design. It also includes basic plots of these results. The results will only be available if there was more than one condition.

MSstats mzTab

The mzTab from the proteomics_lfq folder with replaced normalized and imputed quantities from MSstats. This might contain less quantities since MSstats filters proteins with too many missing values.

MSstats table

See MSstats vignette.

MSstats plots

See MSstats vignette for groupComparisonPlots (Heatmap, VolcanoPlot and ComparisonPlot (per protein)).

PTXQC output

If activated, the ptxqc folder will contain the report of the PTXQC R package based on the mzTab output of proteomicsLFQ.

PTXQC report

See PTXQC vignette. In the report itself the calculated and visualized QC metrics are actually quite extensively described already.

PTXQC yaml config

The default yaml config used to configure the structure of the QC report. In case you need to restructure, please edit this file and re-run PTXQC manually.

On this page

nf-core/proteomicslfqEdit