Description

Cluster genome FASTA files by average nucleotide identity

Input

name:type
description
pattern

meta:map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

bins:file

A list of fasta-formatted genomes for dereplication

*.{fa,fna,fa.gz, etc}

qc_table:file

(optional) A summary TSV from either CheckM [https://nf-co.re/modules/checkm_lineagewf], CheckM2 [https://nf-co.re/modules/checkm2_predict/], or a CSV
in drep-style format [https://github.com/MrOlm/drep] with three columnns, genome,completeness,contamination. In both cases the first column should contain the
names of the input genome files, minus the last file extension
(i.e. if the genome is gzipped, the genome name should
retain the .fasta extension).

*.{csv,tsv}

qc_format:string

Defines the type if input table in qc_table, if specified.

checkm|checkm2|genome_info

Output

name:type
description
pattern

tsv

meta:map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

*.tsv:file

TSV file in the format representative_genome \t member_genome

*.tsv

dereplicated_bins

meta:map

Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

${prefix}/*:file

The representative genomes following dereplication by galah.

*

versions

versions.yml:file

File containing software versions

versions.yml

Tools

galah
GPL v3

Galah aims to be a more scalable metagenome assembled genome (MAG) dereplication method.