nf-core/mhcquant
Identify and quantify MHC eluted peptides from mass spectrometry raw data
1.3
). The latest
stable release is
2.6.0
.
Output
This document describes the output produced by the pipeline
Pipeline overview
The final output of the pipeline should include the following files:
- all_features_merged_resolved.mzTab - the community standard format for sharing mass spectrometry search results
- all_features_merged_resolved.csv - aggregate csv report, containing all information about peptide identification and quantification results
- found_neoepitopes.csv - a csv listing all neoepitopes found in the mass spectrometry search, independant of binding predictions
- vcf_neoepitopes.csv - a csv listing all theoretically possible neoepitope sequences from the variants specified in the vcf
- _vcf.fasta - the fasta database including mutated proteins used for the database search
- class_1/2_binding_predictions - a folder containing the respective binding predictions of all detected peptides and all theoretically possible neoepitope sequences
- Intermediate_resuls - a folder containing all intermediate results from the steps in the pipeline (unfiltered and filtered PSMs, aligned mzMLs, features, etc. ..)
- Documentation - a folder containing summarized reports of the pipeline execution
- pipeline_info - a folder containing detailed reports on computational runtimes and workflow steps
mzTab
mzTab is a light-weight format to report mass spectrometry search results. It provides all important information about idenfied peptide hits and is compatible with the PRIDE Archive - proteomics data repository:
Griss, J. et al. The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience. Mol Cell Proteomics 13, 2765–2775 (2014).
csv
The csv output file is a table containing all information extracted from a database search throughout the pipeline. See the OpenMS or PSI documentation for more information about annotated scores and format.
Each row index is represented by a label describing its content:
MAP contains information about the different mzML files that were provided initially
RUN contains information about the search that was performed on each run
PROTEIN contains infomration about the protein ids corresponding to the peptides that were detected (No protein inference was performed)
UNASSIGNEDPEPTIDE contains information about PSMs that were identified but couldn’t be quantified to a precursor feature on MS Level 1.
CONSENSUS contains information about precursor features that were identified in multiple runs (eg. run 1-3 in this case)
PEPTIDE contains information about peptide hits that were identified and correspond to the consensus features described one row above.
found_neoepitopes
csv file listing detected neoepitope sequences:
vcf_neoepitopes
csv file listing theoretically possible neoepitope sequences:
class_1/2_binding_predictions
The prediction outputs are comma separated table (csv) for each allele, listing each peptide sequence and its corresponding predicted affinity scores: