Description

Hap.py is a tool to compare diploid genotypes at haplotype level. Rather than comparing VCF records row by row, hap.py will generate and match alternate sequences in a superlocus. A superlocus is a small region of the genome (sized between 1 and around 1000 bp) that contains one or more variants.

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

query_vcf (file)

VCF/GVCF file to query

*.{gvcf,vcf}.gz

truth_vcf (file)

gold standard VCF file

*.{gvcf,vcf}.gz

regions_bed (file)

Sparse regions to restrict the analysis to

*.bed

targets_bed (file)

Dense regions to restrict the analysis to

*.bed

fasta (file)

FASTA file of the reference genome

*.{fa,fasta}

fasta_fai (file)

The index of the reference FASTA

*.fai

false_positives_bed (file)

False positive / confident call regions. Calls outside these regions will be labelled as UNK.

*.{bed,bed.gz}

stratification_tsv (file)

Stratification file list in TSV format

*.tsv

stratification_beds (file(s))

One or more BED files used for stratification (these should be referenced in the stratification TSV)

*.bed

Output

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

summary_csv (file)

A CSV file containing the summary of the benchmarking

*.summary.csv

roc_all_csv (file)

A CSV file containing ROC values for all variants

*.roc.all.csv.gz

roc_indel_locations_csv (file)

A CSV file containing ROC values for all indels

*.roc.Locations.INDEL.csv.gz

roc_indel_locations_pass_csv (file)

A CSV file containing ROC values for all indels that passed all filters

*.roc.Locations.INDEL.PASS.csv.gz

roc_snp_locations_csv (file)

A CSV file containing ROC values for all SNPs

*.roc.Locations.SNP.csv.gz

roc_snp_locations_pass_csv (file)

A CSV file containing ROC values for all SNPs that passed all filters

*.roc.Locations.SNP.PASS.csv.gz

extended_csv (file)

A CSV file containing extended info of the benchmarking

*.extended.csv

json (file)

A JSON file containing the run info

*.runinfo.json

runinfo (file)

A JSON file containing the benchmarking metrics

*.metrics.json.gz

vcf (file)

An annotated VCF

*.vcf.gz

tbi (file)

The index of the annotated VCF

*.tbi

versions (file)

File containing software versions

versions.yml