nf-core/mhcquant
Identify and quantify MHC eluted peptides from mass spectrometry raw data
2.0.0
). The latest
stable release is
2.6.0
.
Introduction
This document describes the output produced by the pipeline.
nextflow run mhcquant -profile test,<docker/singularity/institute>
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
General
Quantification
Output files
csv
: If--skip_quantification
is not specified.
The CSV output file is a table containing all information extracted from a database search throughout the pipeline. See the OpenMS or PSI documentation for more information about annotated scores and format.
MAP contains information about the different mzML files that were provided initially
#MAP id filename label size
RUN contains information about the search that was performed on each run
#RUN run_id score_type score_direction date_time search_engine_version parameters
PROTEIN contains information about the protein ids corresponding to the peptides that were detected (No protein inference was performed)
#PROTEIN score rank accession protein_description coverage sequence
UNASSIGNEDPEPTIDE contains information about PSMs that were identified but couldn’t be quantified to a precursor feature on MS Level 1
#UNASSIGNEDPEPTIDE rt mz score rank sequence charge aa_before aa_after score_type search_identifier accessions FFId_category feature_id file_origin map_index spectrum_reference COMET:IonFrac COMET:deltCn COMET:deltLCn COMET:lnExpect COMET:lnNumSP COMET:lnRankSP MS:1001491 MS:1001492 MS:1001493 MS:1002252 MS:1002253 MS:1002254 MS:1002255 MS:1002256 MS:1002257 MS:1002258 MS:1002259 num_matched_peptides protein_references target_decoy
CONSENSUS contains information about precursor features that were identified in multiple runs (eg. run 1-3 in this case)
#CONSENSUS rt_cf mz_cf intensity_cf charge_cf width_cf quality_cf rt_0 mz_0 intensity_0 charge_0 width_0 rt_1 mz_1 intensity_1 charge_1 width_1 rt_2 mz_2 intensity_2 charge_2 width_2 rt_3 mz_3 intensity_3 charge_3 width_3
PEPTIDE contains information about peptide hits that were identified and correspond to the consensus features described below
#PEPTIDE rt mz score rank sequence charge aa_before aa_after score_type search_identifier accessions FFId_category fea
Intermediate results
Output files
Intermediate_Results/
*merged_psm_perc_filtered.mzTab
: If--refine_fdr_on_predicted_subset
is specified, consists of the hits (filtered by q-value)*.mztab
: mztab file generated by the OpenMS MzTabExporter command, the community standard format for sharing mass spectrometry search results*.featureXML
: If--skip_quantification
is not specified, then this file is generated by the OpenMS FeatureFinderIdentification command*fdr_filtered.idXML
: If--skip_quantification
is not specified, then this file is generated by the OpenMS IDFilter command*all_ids_merged_psm_perc*.idXML
: idXML files are generated when--refine_fdr_on_predicted_subset
is specified*peptide_filtered.idXML
: If--refine_fdr_on_predicted_subset
is specified, then this file consists of the PSMs prediction outcome*perc_subset.idXML
: If--refine_fdr_on_predicted_subset
is specified, then this file is the outcome of the second percolator run, generated by the OpenMS PercolatorAdapter
This folder contains the intermediate results from various steps of the MHCquant pipeline (e.g. (un)filtered PSMs, aligned mzMLs, features)
The output mzTab contains many columns annotating the most important information - here are a few outpointed:
PEP sequence accession best_search_engine_score[1] retention_time charge mass_to_charge peptide_abundance_study_variable[1]
Most important to know that in this format we annotated the q-value of each peptide identification in the best_seach_engine_score[1]
column and peptide quantities in the peptide_abundance_study_variable` columns.
mzTab is a light-weight format to report mass spectrometry search results. It provides all important information about identified peptide hits and is compatible with the PRIDE Archive - proteomics data repository.
VCF
Reference fasta
Output files
*_vcf.fasta
: If--include_proteins_from_vcf
is specified, then this fasta is created for the respective sample
Neoepitopes
These CSV files list all of the theoretically possible neoepitope sequences from the variants specified in the vcf and neoepitopes that are found during the mass spectrometry search, independant of binding predictions, respectively
found_neoepitopes
Output files
*found_neoepitopes_class1.csv
: Generated when--include_proteins_from_vcf
and--predict_class_1
are specified*found_neoepitopes_class2.csv
: Generated when--include_proteins_from_vcf
and--predict_class_2
are specified
This CSV which lists all neoepitopes that are found during the mass spectrometry search, independant of binding predictions The format is as follows:
peptide sequence geneID
vcf_neoepitopes
Output files
*vcf_neoepitopes_class1.csv
: Generated when--include_proteins_from_vcf
and--predict_class_1
are specified*vcf_neoepitopes_class2.csv
: Generated when--include_proteins_from_vcf
and--predict_class_2
are specified
This CSV fils contains all theoretically possible neoepitope sequences from the variants that were specified in the vcf The format is shown below
Sequence Antigen ID Variants
Class prediction
Class (1|2) bindings
Output files
*predicted_peptides_class_1.csv
: If--predict_class_1
is specified, then this CSV is generated*predicted_peptides_class_2.csv
: If--predict_class_2
is specified, then this CSV is generated
This folder containing the binding predictions of all detected class 1 or 2 peptides and all theoretically possible neoepitope sequences The prediction outputs are comma-separated table (CSV) for each allele, listing each peptide sequence and its corresponding predicted affinity scores:
peptide allele prediction prediction_low prediction_high prediction_percentile
Rotation time prediction
Output files
RT_prediction
*id_RTpredicted.csv
: If--predict_RT
is specified, the rotation time found peptides are provided*txt_RTpredicted.csv
: If--predict_RT
is specified, the rotation time predicted neoepitopes are provided
Workflow reporting and documentation
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.tsv
. - Reformatted sample-sheet files used as input to the pipeline:
samplesheet.valid.csv
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline and provide you with other information such as launch commands, run times and resource usage.