Description

Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’ ]

input (file)

BAM/CRAM file to be summarised.

*.{bam,cram}

input_index (file)

BAM/CRAM file index.

*.{bai,crai}

intervals (file)

File containing specified sites to be used for the summary. If this option is not specified, variants file is used instead automatically.

*.interval_list

meta2 (map)

Groovy Map containing reference information
e.g. [ id:‘genome’ ]

fasta (file)

The reference fasta file

*.fasta

meta3 (map)

Groovy Map containing reference information
e.g. [ id:‘genome’ ]

fai (file)

Index of reference fasta file

*.fasta.fai

meta4 (map)

Groovy Map containing reference information
e.g. [ id:‘genome’ ]

dict (file)

GATK sequence dictionary

*.dict

variants (file)

Population vcf of germline sequencing, containing allele fractions. Is also used as sites file if no separate sites file is specified.

*.vcf.gz

variants_tbi (file)

Index file for the germline resource.

*.vcf.gz.tbi

Output

Name (Type)
Description
Pattern

pileup (file)

File containing the pileup summary table.

*.pileups.table

versions (file)

File containing software versions

versions.yml

Tools

gatk4
Apache-2.0

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.