Description

Tool for imputation and phasing from vcf file or directly from bam files.

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]

input (file)

Either one or multiple BAM/CRAM files in an array containing low-coverage sequencing reads or one VCF/BCF file containing the genotype likelihoods.
When using BAM/CRAM the name of the file is used as samples name.

*.{bam,cram,vcf,vcf.gz,bcf,bcf.gz}

input_index (file)

Index file of the input BAM/CRAM/VCF/BCF file.

*.{bam.bai,cram.crai,vcf.gz.csi,bcf.gz.csi}

samples_file (file)

File with sample names and ploidy information.
One sample per line with a mandatory second column indicating ploidy (1 or 2).
Sample names that are not present are assumed to have ploidy 2 (diploids).
GLIMPSE does NOT handle the use of sex (M/F) instead of ploidy.

*.{txt,tsv}

input_region (string)

Target region used for imputation, including left and right buffers (e.g. chr20:1000000-2000000).
Optional if reference panel is in bin format.

chrXX:leftBufferPosition-rightBufferPosition

output_region (string)

Target imputed region, excluding left and right buffers (e.g. chr20:1000000-2000000).
Optional if reference panel is in bin format.

chrXX:leftBufferPosition-rightBufferPosition

meta2 (map)

Groovy Map containing genomic map information
e.g. [ map:'GRCh38' ]

reference (file)

Reference panel of haplotypes in VCF/BCF format.

*.{vcf.gz,bcf.gz}

reference_index (file)

Index file of the Reference panel file.

*.{vcf.gz.csi,bcf.gz.csi}

map (file)

File containing the genetic map.
Optional if reference panel is in bin format.

*.gmap

fasta_reference (file)

Faidx-indexed reference sequence file in the appropriate genome build.
Necessary for CRAM files.

*.fasta

fasta_reference_index (file)

Faidx index of the reference sequence file in the appropriate genome build.
Necessary for CRAM files.

*.fai

Output

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]

versions (file)

File containing software versions

versions.yml

phased_variants (file)

Output VCF/BCF file containing genotype probabilities (GP field), imputed dosages (DS field), best guess genotypes (GT field), sampled haplotypes in the last (max 16) main iterations (HS field) and info-score.

*.{vcf,bcf,vcf.gz,bcf.gz}

stats_coverage (file)

Optional coverage statistic file created when BAM/CRAM files are used as inputs.

*.txt.gz

Tools

glimpse2
MIT

GLIMPSE2 is a phasing and imputation method for large-scale low-coverage sequencing studies.