Description

This workflow uses the suite FGBIO to identify and remove UMI tags from FASTQ reads convert them to unmapped BAM file, map them to the reference genome, and finally use the mapped information to group UMIs and generate consensus reads in each group

Input

name
description
pattern

meta

Groovy Map containing sample information
e.g. [ id:‘test’ ]

reads

list umi-tagged reads

[ *.{fastq.gz/fq.gz} ]

fasta_fai_dict

The reference fasta file, index and dictionary

*.{fa,fasta,fna,fai,dict}

bwa_index

The reference genome bwa index files

*.{fa,fasta,fna,amb,ann,bwt,pac,sa}

groupreadsbyumi_strategy

Defines the UMI assignment strategy.

aligner

The aligner to use for mapping the reads to the reference genome. Options are bwa-mem and bwamem2.

duplex

Whether the library contains duplex UMIs.

min_reads

One integer (for non-duplex) or a string of up-to three space-separated numbers for duplex sequencing

min_baseq

Minimum base quality for bases to be considered in consensus calling.

max_base_error_rate

Maximum base error rate for consensus building

Output

name
description
pattern

ubam

unmapped bam file

*.bam

groupbam

mapped bam file, where reads are grouped by UMI tag

*.bam

consensusbam

mapped bam file, where reads are created as consensus of those
belonging to the same UMI group

*.bam

mappedconsensusbam

mapped bam file, where reads are created as consensus of those
belonging to the same UMI group and filtered for minimum base quality and maximum error rate

*.bam