Description

Cluster genome FASTA files by average nucleotide identity

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]

bins (file)

A list of fasta-formatted genomes for dereplication

*.{fa,fna,fa.gz, etc}

qc_table (file)

(optional) Either a (CheckM)[https://nf-co.re/modules/checkm_lineagewf] summary TSV containing
information on the completeness and contamination of the input genomes (13 columns),
or a 3-column csv with the header genome,completeness,contamination.
In both cases the first column should contain the names of the input genome files,
minus the last file extension
(i.e. if the genome is gzipped, the genome name should retain the .fasta extension).

*.{csv,tsv}

qc_format (string)

Defines the type if input table in qc_table, if specified.

checkm|genome_info

Output

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]

tsv (file)

TSV file in the format representative_genome \t member_genome

*.tsv

dereplicated_bins (file)

The representative genomes following dereplication by galah.

*

versions (file)

File containing software versions

versions.yml

Tools

galah
GPL v3

Galah aims to be a more scalable metagenome assembled genome (MAG) dereplication method.