nf-core/epitopeprediction
A bioinformatics best-practice analysis pipeline for epitope prediction and annotation
Introduction
This document describes the output produced by the pipeline. The version of all tools used in the pipeline are summarized in a MultiQC report which is generated at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
- Epitope Prediction - Predict MHC-binding peptides
- MultiQC - Aggregate report describing results and QC from the whole pipeline
- Pipeline information - Report metrics generated during the workflow execution
Epitope Prediction
FRED2 is used to perform the prediction of epitopes on the given data, independent of the chosen tools
to perform the prediction.
Output directory: merged_predictions/
[input_base_name]_prediction_report.json
- The statistics of the performed prediction in JSON format.
[input_base_name]_prediction_result.tsv
- The predicted epitopes in TSV format for further processing.
Partial results, e.g. predictions per chromosome or of individual peptide chunks can be found in predictions/
.
An example prediction result looks like this in TSV format:
An example prediction report looks like this in JSON format:
The prediction results are given as allele-specific score and affinity values per peptide. The computation of these values depends on the applied prediction method:
Syfpeithi
:- Affinity: Calculated based on the score as the percentage of the maximum value of the corresponding matrix:
score(peptide) divided by the maximum score of the allele/length-specific matrix * 100
. - Score: Sum of the values given by the allele-specific position-specific scoring matrix (PSSM) for the respective peptide sequence. Peptides are considered binders if the affinity is higher than 50.
- Affinity: Calculated based on the score as the percentage of the maximum value of the corresponding matrix:
MHCflurry
,MHCnuggets
andNetMHC
tool family:- Affinity: Predicted IC50 (threshold for binders:
<500 nmol/L
). - Score: The provided score is calculated from the log-transformed predicted binding affinity and scaled to an interval of 0 to 1:
1-log50000(aff)
.
- Affinity: Predicted IC50 (threshold for binders:
When the parameter --fasta_output
is specified, a FASTA
file will be generated containing the protein sequences that are affected by the provided genomic variants. The resulting FASTA
file will contain the wild-type and mutated protein sequences.
Output directory: merged_predictions/
[input_base_name]_prediction.fasta
- The sequences of proteins, affected by provided variants, in FASTA format.
Supported models
When running the pipeline using the --show_supported_models
parameter, information about supported models for the available predictor tool versions will be written to the results folder.
Output directory: supported_models/
-
[tool].[version].supported_alleles.txt
- A list of all supported alleles by the corresponding predictor method.
-
[tool].[version].supported_lengths.txt
- A list of all supported peptide lengths by the corresponding predictor method.
-
Pipeline information - Report metrics generated during the workflow execution
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.yml
. Thepipeline_report*
files will only be present if the--email
/--email_on_fail
parameter’s are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
. - Parameters used by the pipeline run:
params.json
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.