Program to compute the genotyping error rate at the sample or marker level.
Input
name:type
description
pattern
meta
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
estimate
:file
Imputed dataset file obtain after phasing.
*.{vcf,bcf,vcf.gz,bcf.gz}
estimate_index
:file
Index file for the imputed dataset file.
truth
:file
Validation dataset called at the same positions as the imputed file.
*.{vcf,bcf,vcf.gz,bcf.gz}
truth_index
:file
Index file for the truth file.
freq
:file
File containing allele frequencies at each site.
*.{vcf,bcf,vcf.gz,bcf.gz}
freq_index
:file
Index file for the allele frequencies file.
samples
:file
List of samples to process, one sample ID per line.
*.{txt,tsv}
region
:string
Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000). Can also be a list of such regions.
chrXX:leftBufferPosition-rightBufferPosition
meta2
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
groups
:file
Alternative to frequency bins, group bins are user defined, provided in a file.
*.{txt,tsv}
bins
:string
Allele frequency bins used for rsquared computations.
By default they should as MAF bins [0-0.5], while
they should take the full range [0-1] if –use-ref-alt is used.
0 0.01 0.05 ... 0.5
ac_bins
:string
User-defined allele count bins used for rsquared computations.
1 2 5 10 20 ... 100000
allele_counts
:string
Default allele count bins used for rsquared computations.
AN field must be defined in the frequency file.
min_val_gl
:float
Minimum genotype likelihood probability P(G|R) in validation data.
Set to zero to have no filter of if using –gt-validation
min_val_dp
:integer
Minimum coverage in validation data.
If FORMAT/DP is missing and –min_val_dp > 0, the program exits with an error.
Set to zero to have no filter of if using –gt-validation
Output
name:type
description
pattern
errors_cal
meta
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*.error.cal.txt.gz
:file
Calibration correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.
*.errors.cal.txt.gz
errors_grp
meta
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*.error.grp.txt.gz
:file
Groups correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.
*.errors.grp.txt.gz
errors_spl
meta
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*.error.spl.txt.gz
:file
Samples correlation errors between imputed dosages (in MAF bins) and highly-confident genotype.
*.errors.spl.txt.gz
rsquare_grp
meta
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*.rsquare.grp.txt.gz
:file
Groups r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.
*.rsquare.grp.txt.gz
rsquare_spl
meta
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*.rsquare.spl.txt.gz
:file
Samples r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.
*.rsquare.spl.txt.gz
rsquare_per_site
meta
:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_r2_sites.txt.gz
:file
Variant r-squared correlation between imputed dosages (in MAF bins) and highly-confident genotype.