Description

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’ ]

vcf (file)

input vcf file containing the variants to be recalibrated

*.vcf.gz

tbi (file)

tbi file matching with -vcf

*.vcf.gz.tbi

resource_vcf (file)

all resource vcf files that are used with the corresponding ‘—resource’ label

*.vcf.gz

resource_tbi (file)

all resource tbi files that are used with the corresponding ‘—resource’ label

*.vcf.gz.tbi

labels (string)

necessary arguments for GATK VariantRecalibrator. Specified to directly match the resources provided. More information can be found at https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator

fasta (file)

The reference fasta file

*.fasta

fai (file)

Index of reference fasta file

fasta.fai

dict (file)

GATK sequence dictionary

*.dict

Output

Name (Type)
Description
Pattern

recal (file)

Output recal file used by ApplyVQSR

*.recal

idx (file)

Index file for the recal output file

*.idx

tranches (file)

Output tranches file used by ApplyVQSR

*.tranches

plots (file)

Optional output rscript file to aid in visualization of the input data and learned model.

*plots.R

version (file)

File containing software versions

*.versions.yml

Tools

gatk4

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.