nf-core/airrflow
Edit

B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework

airrb-cellimmcantationimmunorepertoirerepseq

These pages are for an old version of the pipeline (1.0.0). The latest stable release is 4.0 .

Launch version 1.0.0 https://github.com/nf-core/airrflow

nf-core/bcellmagic: Output

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

Fetching databases - Fetching igblast and imgt databases
FastQC - read quality control
Filter sequence quality - filter sequences by quality
Mask primers - Masking primers
Pair mates - Pairing sequence mates.
Cluster sets - Cluster sequences according to similarity.
Build consensus - Build UMI consensus
Re-pair mates - Re-pairing sequence mates.
Assemble mates - Assemble sequence mates.
Remove duplicates - Remove read duplicates.
Filter sequences for at least 2 representative Filter sequences that do not have at least 2 reads assigned.
Assign genes with IgBlast
Determining genotype and hamming distance threshold
Defining clones - Defining clonal B-cell populations
Reconstructing germlines - Reconstruct gene calls of germline sequences
Alakazam - Repertoire analysis.

Fetching databases

Fetching igblast and imgt databases.

Output directory: results/dbs If saveDBs parameter is set, then database cache will be saved in the results directory.

igblast_base
- Contains igblast database cache.
imgtdb_base
- Contains imgt database cache.

FastQC

FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences.

For further reading and documentation see the FastQC help.

NB: The FastQC plots displayed in the MultiQC report shows untrimmed reads. They may contain adapter sequence and potentially regions with low quality. To see how your reads look after trimming, look at the FastQC reports in the trim_galore directory.

Output directory: results/fastqc

sample_fastqc.html
- FastQC report, containing quality metrics for your untrimmed raw fastq files
zips/sample_fastqc.zip
- zip file containing the FastQC report, tab-delimited data file and plot images

Filter sequence quality

Filters reads that are below a quality threshold by using the tool FilterSeq from the Presto Immcantation toolset. The default quality threshold is 20.

Output directory: results/filter_by_sequence_quality

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with only reads that passed the quality filter.
fastq/*.tab
- table containing read ID and quality.
fastq/*.log
- Log of the process.

Mask primers

Masks primers that are provided in the C-primers and V-primers input files. It uses the tool MaskPrimers of the Presto Immcantation toolset.

Output directory: results/mask_primers

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads with masked primers.
fastq/*.log
- Log containing sequence identifiers and the error in masking primers.

Pair mates

Pair read mates using PairSeq from the Presto Immcantation toolset.

Output directory: results/pair_sequences

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that passed mate pairing.

Cluster sets

Cluster sequences according to similarity, using ClusterSets set. This step is introduced to deal with too low UMI diversity.

Output directory: results/cluster_sets

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads and annotation in their headers of cluster group.

Build UMI consensus

Build consensus of UMI from all sequences that were annotated to have the same UMI. Uses BuildConsensus.

Output directory: results/build_consensus

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that passed the build consensus ste.
info/*.tab
- Parsed log containing the sequence barcodes and primers info

Re-pair mates

Re-pair read mates using PairSeq from the Presto Immcantation toolset.

Output directory: results/repair_mates

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that passed mate pairing.
info/*.tab
- Parsed log contaning the sequence barcodes and re-pair info.

Assemble mates

Assemble read mates using AssemblePairs from the Presto Immcantation toolset.

Output directory: results/assemble_pairs

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with assembled reads.
info/*.tab
- Parsed log contaning the sequence barcodes and assemble pairs.

Remove duplicates

Remove duplicates using CollapseSeq from the Presto Immcantation toolset.

Output directory: results/deduplicates

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with de-duplicated reads.
info/*.tab
- Parsed log contaning the sequence barcodes and deduplicated pairs.

Filter sequences for at least 2 representative

Remove duplicates using SplitSeq from the Presto Immcantation toolset.

Output directory: results/filter_representative_2

command_log.txt
- Log of the process that will be parsed to generate a report.
fastq/*.fastq
- Fastq with reads that have at least 2 representatives.
info/*.tab
- Parsed log contaning the sequence barcodes and split seq information.

Assign genes with IgBlast

Assign genes from the IGblast database using AssignGenes and generating a table with MakeDB. Non-functional sequences are removed with ParseDb. Sequences in are additionally converted to a fasta file with the ConvertDb tool.

Output directory: results/igblast

command_log.txt
- Log of the process that will be parsed to generate a report.
fasta/*.fasta
- Blast results converted to fasta fall with genotype V-call annotated in the header.
table/*.tab
- Table in ChangeO format contaning the assigned gene information and metadata provided in the starting metadata sheet.

Determining genotype and hamming distance threshold

Determining genotype and the hamming distance threshold of the junction regions for clonal determination using the tigGER and Shazam.

Output directory: results/shazam

threshold.txt
- Hamming distance threshold of the Junction regions as determined by Shazam.
Hamming_distance_threshold.pdf
- Plot of the Hamming distance distribution between junction regions displaying the threshold for clonal assignment as determined by Shazam.
genotype.pdf
- Plot representing the patient genotype assessed by TigGER.
igh_genotyped.tab
- Table in ChangeO additionally containing the assigned genotype in V_CALL_GENOTYPED.
v_genotype.fasta
- Fasta file containing the full sequences for all V genes assigned to the patient.

Defining clones

Assigning clones to the sequences obtained from IgBlast with the DefineClones Immcantation tool.

Output directory: results/define_clones

command_log.txt
- Log of the process that will be parsed to generate a report.
table/igh_genotyped_clone-pass.tab
- Table in ChangeO format contaning the assigned gene information and an additional field with the clone number.
info/igh_genotyped_table.tab
- Parsed log with sequence ID, assigned gene calls, junction length and clones.

Reconstructing germlines

Reconstructing the germline sequences with the CreateGermlines Immcantation tool.