nf-core/mhcquant
Identify and quantify MHC eluted peptides from mass spectrometry raw data
1.3
). The latest
stable release is
2.6.0
.
Output
This document describes the output produced by the pipeline
Pipeline overview
The final output of the pipeline should include the following files:
- all_features_merged_resolved.mzTab - the community standard format for sharing mass spectrometry search results
- all_features_merged_resolved.csv - aggregate csv report, containing all information about peptide identification and quantification results
- found_neoepitopes.csv - a csv listing all neoepitopes found in the mass spectrometry search, independant of binding predictions
- vcf_neoepitopes.csv - a csv listing all theoretically possible neoepitope sequences from the variants specified in the vcf
- _vcf.fasta - the fasta database including mutated proteins used for the database search
- class_1/2_binding_predictions - a folder containing the respective binding predictions of all detected peptides and all theoretically possible neoepitope sequences
- Intermediate_resuls - a folder containing all intermediate results from the steps in the pipeline (unfiltered and filtered PSMs, aligned mzMLs, features, etc. ..)
- Documentation - a folder containing summarized reports of the pipeline execution
- pipeline_info - a folder containing detailed reports on computational runtimes and workflow steps
mzTab
mzTab is a light-weight format to report mass spectrometry search results. It provides all important information about idenfied peptide hits and is compatible with the PRIDE Archive - proteomics data repository:
Griss, J. et al. The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience. Mol Cell Proteomics 13, 2765–2775 (2014).
csv
The csv output file is a table containing all information extracted from a database search throughout the pipeline. See the OpenMS or PSI documentation for more information about annotated scores and format.
Each row index is represented by a label describing its content:
#MAP id filename label size
MAP contains information about the different mzML files that were provided initially
#RUN run_id score_type score_direction date_time search_engine_version parameters
RUN contains information about the search that was performed on each run
#PROTEIN score rank accession protein_description coverage sequence
PROTEIN contains infomration about the protein ids corresponding to the peptides that were detected (No protein inference was performed)
#UNASSIGNEDPEPTIDE rt mz score rank sequence charge aa_before aa_after score_type search_identifier accessions FFId_category feature_id file_origin map_index spectrum_reference COMET:IonFrac COMET:deltCn COMET:deltLCn COMET:lnExpect COMET:lnNumSP COMET:lnRankSP MS:1001491 MS:1001492 MS:1001493 MS:1002252 MS:1002253 MS:1002254 MS:1002255 MS:1002256 MS:1002257 MS:1002258 MS:1002259 num_matched_peptides protein_references target_decoy
UNASSIGNEDPEPTIDE contains information about PSMs that were identified but couldn’t be quantified to a precursor feature on MS Level 1.
#CONSENSUS rt_cf mz_cf intensity_cf charge_cf width_cf quality_cf rt_0 mz_0 intensity_0 charge_0 width_0 rt_1 mz_1 intensity_1 charge_1 width_1 rt_2 mz_2 intensity_2 charge_2 width_2 rt_3 mz_3 intensity_3 charge_3 width_3
CONSENSUS contains information about precursor features that were identified in multiple runs (eg. run 1-3 in this case)
#PEPTIDE rt mz score rank sequence charge aa_before aa_after score_type search_identifier accessions FFId_category fea
PEPTIDE contains information about peptide hits that were identified and correspond to the consensus features described one row above.
found_neoepitopes
csv file listing detected neoepitope sequences:
peptide sequence geneID
vcf_neoepitopes
csv file listing theoretically possible neoepitope sequences:
Sequence Antigen ID Variants
class_1/2_binding_predictions
The prediction outputs are comma separated table (csv) for each allele, listing each peptide sequence and its corresponding predicted affinity scores:
peptide allele prediction prediction_low prediction_high prediction_percentile