Description

Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-

Input

name:type
description
pattern

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

vcf:file

input vcf file containing the variants to be recalibrated

*.vcf.gz

tbi:file

tbi file matching with -vcf

*.vcf.gz.tbi

resource_vcf:file

all resource vcf files that are used with the corresponding ‘—resource’ label

*.vcf.gz

resource_tbi:file

all resource tbi files that are used with the corresponding ‘—resource’ label

*.vcf.gz.tbi

labels:string

necessary arguments for GATK VariantRecalibrator. Specified to directly match the resources provided. More information can be found at https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator

fasta:file

The reference fasta file

*.fasta

fai:file

Index of reference fasta file

fasta.fai

dict:file

GATK sequence dictionary

*.dict

Output

name:type
description
pattern

recal

meta:file

Output recal file used by ApplyVQSR

*.recal

*.recal:file

Output recal file used by ApplyVQSR

*.recal

idx

meta:file

Index file for the recal output file

*.idx

*.idx:file

Index file for the recal output file

*.idx

tranches

meta:file

Output tranches file used by ApplyVQSR

*.tranches

*.tranches:file

Output tranches file used by ApplyVQSR

*.tranches

plots

meta:file

Optional output rscript file to aid in visualization of the input data and learned model.

*plots.R

*plots.R:file

Optional output rscript file to aid in visualization of the input data and learned model.

*plots.R

versions

versions.yml:file

File containing software versions

versions.yml

Tools

gatk4

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.