Description

Detecting and estimating inter-sample DNA contamination became a crucial quality assessment step to ensure high quality sequence reads and reliable downstream analysis.

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

bam (file)

BAM file, a sorted, indexed, base quality recalibrated, and duplication-marked BAM file.
It also requires to contain “@RG” header lines to annotation different readGroups (sequencing runs and lanes).
The SM tag in the “@RG” header should match with one of the genotyped sample.

*.bam

bai (file)

BAM index file BAI

*.bai

refvcf (file)

The input VCF file contains
(1) external genotype information and/or
(2) allele frequency information as AF entry or AC/AN entries in the INFO field.

*.{vcf,vcf.gz}

Output

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

versions (file)

File containing software versions

versions.yml

log (file)

Detailed summary of the verifyBamID result.

*.log

selfsm (file)

Per-sample statistics describing how well the sample matches to the annotated sample.

*.selfSM

depthsm (file)

The depth distribution of the sequence reads per sample

*.depthSM

selfrg (file)

Per-readGroup statistics describing how well each lane matches to the annotated sample. (available only without —ignoreRG option)

*.selfRG

depthrg (file)

The depth distribution of the sequence reads per readGroup. (available only without —ignoreRG option)

*.depthRG

bestsm (file)

Per-sample best-match statistics with best-matching sample among the genotyped sample (available only with —best option)

*.bestSM

bestrg (file)

Per-readgroup best-match statistics with best-matching sample among the genotyped sample (available only with —best and without —ignoreRG option)

*.bestRG

Tools

verifybamid
GPL v3

verifyBamID is a software that verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples.