Description

Program to compute the genotyping error rate at the sample or marker level.

Input

name:type
description
pattern

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

estimate:file

Imputed dataset file obtain after phasing.

*.{vcf,bcf,vcf.gz,bcf.gz}

estimate_index:file

Index file for the imputed dataset file.

truth:file

Validation dataset called at the same positions as the imputed file.

*.{vcf,bcf,vcf.gz,bcf.gz}

truth_index:file

Index file for the truth file.

freq:file

File containing allele frequencies at each site.

*.{vcf,bcf,vcf.gz,bcf.gz}

freq_index:file

Index file for the allele frequencies file.

samples:file

List of samples to process, one sample ID per line.

*.{txt,tsv}

region:string

Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). Can also be a list of such regions.

chrXX:leftBufferPosition-rightBufferPosition

meta2:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

groups:file

Alternative to frequency bins, group bins are user defined, provided in a file.

*.{txt,tsv}

bins:string

Allele frequency bins used for rsquared computations. By default they should as MAF bins [0-0.5], while they should take the full range [0-1] if —use-ref-alt is used.

0 0.01 0.05 ... 0.5

ac_bins:string

User-defined allele count bins used for rsquared computations.

1 2 5 10 20 ... 100000

allele_counts:string

Default allele count bins used for rsquared computations. AN field must be defined in the frequency file.

min_val_gl:float

Minimum genotype likelihood probability P(G|R) in validation data. Set to zero to have no filter of if using –gt-validation

min_val_dp:integer

Minimum coverage in validation data. If FORMAT/DP is missing and –min_val_dp > 0, the program exits with an error. Set to zero to have no filter of if using –gt-validation

Output

name:type
description
pattern

errors_cal

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.error.cal.txt.gz:file

Calibration correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.cal.txt.gz

errors_grp

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.error.grp.txt.gz:file

Groups correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.grp.txt.gz

errors_spl

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.error.spl.txt.gz:file

Samples correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.

*.errors.spl.txt.gz

rsquare_grp

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.rsquare.grp.txt.gz:file

Groups r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

*.rsquare.grp.txt.gz

rsquare_spl

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.rsquare.spl.txt.gz:file

Samples r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

*.rsquare.spl.txt.gz

rsquare_per_site

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*_r2_sites.txt.gz:file

Variant r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.

_r2_sites.txt.gz

versions

versions.yml:file

File containing software versions.

versions.yml

Tools

glimpse2
MIT

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.