nf-core/airrflow
B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework
1.0.0
). The latest
stable release is
4.1.0
.
nf-core/bcellmagic: Output
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
- Fetching databases - Fetching igblast and imgt databases
- FastQC - read quality control
- Filter sequence quality - filter sequences by quality
- Mask primers - Masking primers
- Pair mates - Pairing sequence mates.
- Cluster sets - Cluster sequences according to similarity.
- Build consensus - Build UMI consensus
- Re-pair mates - Re-pairing sequence mates.
- Assemble mates - Assemble sequence mates.
- Remove duplicates - Remove read duplicates.
- Filter sequences for at least 2 representative Filter sequences that do not have at least 2 reads assigned.
- Assign genes with IgBlast
- Determining genotype and hamming distance threshold
- Defining clones - Defining clonal B-cell populations
- Reconstructing germlines - Reconstruct gene calls of germline sequences
- Alakazam - Repertoire analysis.
Fetching databases
Fetching igblast and imgt databases.
Output directory: results/dbs
If saveDBs parameter is set, then database cache will be saved in the results directory.
igblast_base
- Contains igblast database cache.
imgtdb_base
- Contains imgt database cache.
FastQC
FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences.
For further reading and documentation see the FastQC help.
NB: The FastQC plots displayed in the MultiQC report shows untrimmed reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the
trim_galore
directory.
Output directory: results/fastqc
sample_fastqc.html
- FastQC report, containing quality metrics for your untrimmed raw fastq files
zips/sample_fastqc.zip
- zip file containing the FastQC report, tab-delimited data file and plot images
Filter sequence quality
Filters reads that are below a quality threshold by using the tool FilterSeq from the Presto Immcantation toolset. The default quality threshold is 20.
Output directory: results/filter_by_sequence_quality
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with only reads that passed the quality filter.
fastq/*.tab
- table containing read ID and quality.
fastq/*.log
- Log of the process.
Mask primers
Masks primers that are provided in the C-primers and V-primers input files. It uses the tool MaskPrimers of the Presto Immcantation toolset.
Output directory: results/mask_primers
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads with masked primers.
fastq/*.log
- Log containing sequence identifiers and the error in masking primers.
Pair mates
Pair read mates using PairSeq from the Presto Immcantation toolset.
Output directory: results/pair_sequences
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that passed mate pairing.
Cluster sets
Cluster sequences according to similarity, using ClusterSets set. This step is introduced to deal with too low UMI diversity.
Output directory: results/cluster_sets
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads and annotation in their headers of cluster group.
Build UMI consensus
Build consensus of UMI from all sequences that were annotated to have the same UMI. Uses BuildConsensus.
Output directory: results/build_consensus
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that passed the build consensus ste.
info/*.tab
- Parsed log containing the sequence barcodes and primers info
Re-pair mates
Re-pair read mates using PairSeq from the Presto Immcantation toolset.
Output directory: results/repair_mates
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that passed mate pairing.
info/*.tab
- Parsed log contaning the sequence barcodes and re-pair info.
Assemble mates
Assemble read mates using AssemblePairs from the Presto Immcantation toolset.
Output directory: results/assemble_pairs
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with assembled reads.
info/*.tab
- Parsed log contaning the sequence barcodes and assemble pairs.
Remove duplicates
Remove duplicates using CollapseSeq from the Presto Immcantation toolset.
Output directory: results/deduplicates
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with de-duplicated reads.
info/*.tab
- Parsed log contaning the sequence barcodes and deduplicated pairs.
Filter sequences for at least 2 representative
Remove duplicates using SplitSeq from the Presto Immcantation toolset.
Output directory: results/filter_representative_2
command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that have at least 2 representatives.
info/*.tab
- Parsed log contaning the sequence barcodes and split seq information.
Assign genes with IgBlast
Assign genes from the IGblast database using AssignGenes and generating a table with MakeDB. Non-functional sequences are removed with ParseDb. Sequences in are additionally converted to a fasta file with the ConvertDb tool.
Output directory: results/igblast
command_log.txt
- Log of the process that will be parsed to generate a report.
fasta/*.fasta
- Blast results converted to fasta fall with genotype V-call annotated in the header.
table/*.tab
- Table in ChangeO format contaning the assigned gene information and metadata provided in the starting metadata sheet.
Determining genotype and hamming distance threshold
Determining genotype and the hamming distance threshold of the junction regions for clonal determination using the tigGER and Shazam.
Output directory: results/shazam
threshold.txt
- Hamming distance threshold of the Junction regions as determined by Shazam.
Hamming_distance_threshold.pdf
- Plot of the Hamming distance distribution between junction regions displaying the threshold for clonal assignment as determined by Shazam.
genotype.pdf
- Plot representing the patient genotype assessed by TigGER.
igh_genotyped.tab
- Table in ChangeO additionally containing the assigned genotype in V_CALL_GENOTYPED.
v_genotype.fasta
- Fasta file containing the full sequences for all V genes assigned to the patient.
Defining clones
Assigning clones to the sequences obtained from IgBlast with the DefineClones Immcantation tool.
Output directory: results/define_clones
command_log.txt
- Log of the process that will be parsed to generate a report.
table/igh_genotyped_clone-pass.tab
- Table in ChangeO format contaning the assigned gene information and an additional field with the clone number.
info/igh_genotyped_table.tab
- Parsed log with sequence ID, assigned gene calls, junction length and clones.
Reconstructing germlines
Reconstructing the germline sequences with the CreateGermlines Immcantation tool.
Output directory: results/define_clones
command_log.txt
- Log of the process that will be parsed to generate a report.
table/igh_genotyped_clone-pass_germ-pass.tab
- Table in ChangeO format contaning the assigned gene information and an additional field with the germline reconstructed gene calls.
Alakazam
Repertoire analysis with the Alakazam R package from the Immcantation toolset.
Output directory: results/alakazam
igh_genotyped_clone-pass_germ-pass.tab
- Final table in ChangeO format contaning the assigned gene information and an additional field with the germline reconstructed gene calls.