nf-core/mhcquant
Identify and quantify MHC eluted peptides from mass spectrometry raw data
Introduction
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
General
*.mzTab
*.tsv
The mzTab output file follows the a HUPO-PSI format and combines all information of the sample-condition group extracted from a database search throughout the pipeline. A detailed explanation of the respective entries are elaborately explained here. MzTab files are compatible with the PRIDE Archive - proteomics data repository and can be uploaded as search files.
MzTab files contain many columns and annotate the most important information - here are a few outpointed:
By default (only identification) the best_search_engine_score[1]
holds the percolator q-value. If --quantify
is specified we annotated the Comet XCorr of each peptide identification in the best_search_engine_score[1]
column and peptide quantities in the peptide_abundance_study_variable
columns.
The TSV output file is an alternative output of OpenMS comprising similar information to the mzTab output. The TSV output of identification runs is a simple tab-delimited file holding information about FDR-filtered peptides and currently all values produced by MS²Rescore
. The TSV file in quantification mode (by using --quantify
) is more complex and described in more detail below
TSV Quant
MAP contains information about the different mzML files that were provided initially
RUN contains information about the search that was performed on each run
PROTEIN contains information about the protein ids corresponding to the peptides that were detected (No protein inference was performed)
UNASSIGNEDPEPTIDE contains information about PSMs that were identified but couldn’t be quantified to a precursor feature on MS Level 1
CONSENSUS contains information about precursor features that were identified in multiple runs (eg. run 1-3 in this case)
PEPTIDE contains information about peptide hits that were identified and correspond to the consensus features described below
See documentation of the format or PSI documentation for more information about annotated scores and format.
Intermediate results
This folder contains the intermediate results from various steps of the MHCquant pipeline (e.g. (un)filtered PSMs, aligned mzMLs, features)
Output files
-
intermediate_results/
-
alignment
: Contains thetrafoXML
files of each run that document the retention time shift after alignment in quantification mode. -
comet
: Contains pin files generated by comet after database search -
rescoring
{Sample}_{Condition}_(psm|ms2rescore).idXML
: File holding extra features generated by MS²Rescore that will be used by percolator or mokapot.{Sample}_{Condition}_pout.idXML
: Unfiltered percolator output.{Sample}_{Condition}_pout_filtered.idXML
: FDR-filtered percolator output.
-
global_fdr
: Contains global FDR-filtered list of all runs in atsv
file -
features
: Holds information of quantified features infeatureXML
files as a result of the FeatureFinderIdentification in the quantification mode.
-
-
ion_annotations
-
{Sample}_{Condition}_all_peaks.tsv
: Contains metadata of all measured ions of peptides reported after peptide identification. -
{Sample}_{Condition}_matching_ions.tsv
: Contains ion annotations and additional metadata of peptides reported after peptide identification.
-
MultiQC
Output files
-
multiqc/
-
multiqc_report.html
: a standalone HTML file that can be viewed in your web browser. -
multiqc_data/
: directory containing parsed statistics from the different tools used in the pipeline. -
multiqc_plots/
: directory containing static images from the report in various formats.
MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.
Pipeline information
Output files
-
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.html
. - Reports generated by the pipeline:
software_versions.yml
. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
. - Parameters used by the pipeline run:
params.json
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.