Description

Perform annotation with ensemblvep and/or snpeff and bgzip + tabix index the resulting VCF file. This subworkflow uses the scatter-gather method to run VEP/snpEff in parallel to increase throughput. The input VCF is split into multiple smaller VCFs of fixed size, which are annotated separately and concatenated back together to a single output file per sample. Only VCF/BCF outputs are currently supported.

Input

name:type
description
pattern

ch_vcf

vcf file to annotate
Structure: [ val(meta), path(vcf), path(tbi) ]

ch_fasta

Reference genome fasta file (optional)
Structure: [ val(meta2), path(fasta) ]

val_vep_genome :string

genome to use for ensemblvep

val_vep_species :string

species to use for ensemblvep

val_vep_cache_version :integer

cache version to use for ensemblvep

ch_vep_cache

the root cache folder for ensemblvep (optional)
Structure: [ path(cache) ]

ch_vep_extra_files

any extra files needed by plugins for ensemblvep (optional)
Structure: [ path(file1), path(file2)… ]

val_snpeff_db :string

database to use for snpeff, usually consists of the genome and the database version
e.g. WBcel235.105

ch_snpeff_cache

the root cache folder for snpeff (optional)
Structure: [ path(cache) ]

val_tools_to_use :list

The tools to use. Options => ’[“ensemblvep”, “snpeff”]’

val_sites_per_chunk :integer

The amount of variants per scattered VCF.
Set this value to null, [] or false to disable scattering.

Output

name:type
description
pattern

vcf_tbi

Compressed vcf file + tabix index
Structure: [ val(meta), path(vcf), path(tbi) ]

vep_reports :file

html reports generated by Ensembl VEP

*.html

snpeff_reports

csv reports generated by snpeff
Structure: [ val(meta), path(csv) ]

snpeff_html

html reports generated by snpeff
Structure: [ val(meta), path(html) ]

snpeff_genes

txt (tab separated) file having counts of the number of variants
affecting each transcript and gene
Structure: [ val(meta), path(txt) ]

versions :file

File containing software versions

versions.yml