Introduction

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  1. (optional) Conversion of spectra data to indexedMzML: Using ThermoRawFileParser if Thermo Raw or using OpenMS’ FileConverter if just an index is missing
  2. (optional) Decoy database generation for the provided DB (fasta) with OpenMS
  3. Database search with either MSGF+ and/or Comet through OpenMS adapters
  4. Re-mapping potentially identified peptides to the input database for consistency and error-checking (using OpenMS’ PeptideIndexer)
  5. PSM rescoring using PSMFeatureExtractor and Percolator or a PeptideProphet-like distribution fitting approach in OpenMS
  6. If multiple search engines were chosen, the results are combined with OpenMS’ ConsensusID
  7. If multiple search engines were chosen, a combined FDR is calculated
  8. Single run PSM/Peptide-level FDR filtering
  9. If localization of modifications was requested, Luciphor2 is applied via the OpenMS adapter
  10. Protein inference and labelfree quantification based on spectral counting or MS1 feature detection, alignment and integration with OpenMS’ ProteomicsLFQ. Performs an additional experiment-wide FDR filter on protein (and if requested peptide/PSM-level).

A rough visualization follows:

proteomicslfq workflow

Output structure

Output is by default written to the $NXF_WORKSPACE/results folder. The output consists of the following folders (follow the links for a more detailed description):

results

Output description

Nextflow pipeline info

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

Output files:

  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.csv.
    • Documentation for interpretation of results in HTML format: results_description.html.

Identifications

Intermediate output for the PSM/peptide-level filtered identifications per raw/mzML file in OpenMS’ internal idXML format.

ProteomicsLFQ main output

The proteomics_lfq folder contains the output of the pipeline without any statistical postprocessing. It is available in three different formats:

ConsensusXML

A consensusXML file as the closest representation of the internal data structures generated by OpenMS. Helpful for debugging and downstream processing with OpenMS tools.

MSstats-ready quantity table

A simple tsv file ready to be read by the OpenMStoMSstats function of the MSstats R package. It should hold the same quantities as the consensusXML but rearranged in a “long” table format with additional information about the experimental design used by MSstats.

mzTab

A complete mzTab file ready for submission to PRIDE.

MSstats output

The msstats folder contains MSstats’ post-processed (e.g. imputation, outlier removal) quantities and statistical measures of significance for different tested contrasts of the given experimental design. It also includes basic plots of these results. The results will only be available if there was more than one condition.

MSstats mzTab

The mzTab from the proteomics_lfq folder with replaced normalized and imputed quantities from MSstats. This might contain less quantities since MSstats filters proteins with too many missing values.

MSstats table

See MSstats vignette.

MSstats plots

See MSstats vignette for groupComparisonPlots (Heatmap, VolcanoPlot and ComparisonPlot (per protein)).

PTXQC output

If activated, the ptxqc folder will contain the report of the PTXQC R package based on the mzTab output of proteomicsLFQ.

PTXQC report

See PTXQC vignette. In the report itself the calculated and visualized QC metrics are actually quite extensively described already.

PTXQC yaml config

The default yaml config used to configure the structure of the QC report. In case you need to restructure, please edit this file and re-run PTXQC manually.