Description

Compare k-mer frequency in reads and assembly to devise the metrics K* and QV*

Input

name:type
description
pattern

meta:map

Groovy Map containing sample information e.g. [ id:'sample1', single_end:false ]

fasta_assembly:file

Genome assembly in FASTA; uncompressed, gz compressed [REQUIRED]

*.{fasta, fasta.gz}

meta1:map

Groovy Map containing sample read information e.g. [ id:'sample1', single_end:false ]

meryl_db_reads:file

K-mer database produced from raw reads using Meryl [REQUIRED]

*.{meryl_db}

lookup_table:file

Input vector of k-mer probabilities (obtained by genomescope2 with parameter —fitted_hist) [OPTIONAL]

lookup_table.txt

seqmers:file

Input for pre-built sequence meryl db. By default, the sequence meryl db will be generated from the input genome assembly [OPTIONAL]

*.{meryl_db}

peak:float

Input to hard set copy 1 and infer multiplicity to copy number. Can be calculated using genomescope2 [REQUIRED]

Output

name:type
description
pattern

hist

meta:map

Groovy Map containing sample information e.g. [ id:'sample1', single_end:false ]

*.hist:file

The generated 0-centered k*.histogram for sequences in <fasta_assembly.fasta>. Positive k*.values are expected collapsed copies. Negative k*.values are expected expanded copies. Closer to 0 means the expected and found k-mers are well balenced, 1:1.

*.{hist}

log_stderr

meta:map

Groovy Map containing sample information e.g. [ id:'sample1', single_end:false ]

*.hist.stderr.log:file

Log (stderr) of hist tool execution. The QV and QV*.metrics are reported at the end.

*.{hist.stderr.log}

versions

versions.yml:file

File containing software versions

versions.yml

Tools

merfin
Apache-2.0

Merfin (k-mer based finishing tool) is a suite of subtools to variant filtering, assembly evaluation and polishing via k-mer validation. The subtool -hist estimates the QV (quality value of [Merqury](https://github.com/marbl/merqury)) for each scaffold/contig and genome-wide averages. In addition, Merfin produces a QV* estimate, which accounts also for kmers that are seen in excess with respect to their expected multiplicity predicted from the reads.