Description

Program to compute the genotyping error rate at the sample or marker level.

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

region (string)

Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). Can also be a list of such regions.

chrXX:leftBufferPosition-rightBufferPosition

freq (file)

File containing allele frequencies at each site.

*.{vcf,bcf,vcf.gz,bcf.gz}

truth (file)

Validation dataset called at the same positions as the imputed file.

*.{vcf,bcf,vcf.gz,bcf.gz}

estimate (file)

Imputed dataset file obtain after phasing.

*.{vcf,bcf,vcf.gz,bcf.gz}

samples (file)

List of samples to process, one sample ID per line.

*.{txt,tsv}

groups (file)

Alternative to frequency bins, group bins are user defined, provided in a file.

*.{txt,tsv}

bins (string)

Allele frequency bins used for rsquared computations.
By default they should as MAF bins [0-0.5], while
they should take the full range [0-1] if —use-ref-alt is used.

0 0.01 0.05 ... 0.5

ac_bins (string)

User-defined allele count bins used for rsquared computations.

1 2 5 10 20 ... 100000

allele_counts (string)

Default allele count bins used for rsquared computations.
AN field must be defined in the frequency file.

min_val_gl (float)

Minimum genotype likelihood probability P(G|R) in validation data.
Set to zero to have no filter of if using –gt-validation

min_val_dp (integer)

Minimum coverage in validation data.
If FORMAT/DP is missing and –min_val_dp > 0, the program exits with an error.
Set to zero to have no filter of if using –gt-validation

Output

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

versions (file)

File containing software versions.

versions.yml

errors_cal (file)

Calibration correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.cal.txt.gz

errors_grp (file)

Groups correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.grp.txt.gz

errors_spl (file)

Samples correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.spl.txt.gz

rsquare_grp (file)

Groups r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

*.rsquare.grp.txt.gz

rsquare_spl (file)

Samples r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

*.rsquare.spl.txt.gz

rsquare_per_site (file)

Variant r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

_r2_sites.txt.gz

Tools

glimpse2
MIT

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.