Description

Compute the r2 correlation between imputed dosages (in MAF bins) and highly-confident genotype calls from the high-coverage dataset.

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

region (string)

Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000).

chrXX:leftBufferPosition-rightBufferPosition

freq (file)

File containing allele frequencies at each site.

*.{vcf,bcf,vcf.gz,bcf.gz}

truth (file)

Validation dataset called at the same positions as the imputed file.

*.{vcf,bcf,vcf.gz,bcf.gz}

estimate (file)

Imputed data.

*.{vcf,bcf,vcf.gz,bcf.gz}

min_prob (float)

Minimum posterior probability P(G|R) in validation data

min_dp (integer)

Minimum coverage in validation data.
If FORMAT/DP is missing and —minDP > 0, the program exits with an error.

bins (string)

Allele frequency bins used for rsquared computations.
By default they should as MAF bins [0-0.5], while
they should take the full range [0-1] if —use-ref-alt is used.

Output

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

versions (file)

File containing software versions

versions.yml

errors_cal (file)

Calibration correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.cal.txt.gz

errors_grp (file)

Groups correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.grp.txt.gz

errors_spl (file)

Samples correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.spl.txt.gz

rsquared_grp (file)

Groups r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

*.rsquare.grp.txt.gz

rsquared_spl (file)

Samples r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

*.rsquare.spl.txt.gz

Tools

glimpse
MIT

GLIMPSE is a phasing and imputation method for large-scale low-coverage sequencing studies.