Description

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

Input

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

input (file)

A genomic file containing one or more sequences as input. Input type is any supported by Biopython (fasta, gbk, etc.)

*

hmm (file)

Alternative HMM file(s) to use in HMMER format

*.hmm

model_dir (directory)

Path to an alternative CRF (Conditional Random Fields) module to use

Output

Name (Type)
Description
Pattern

meta (map)

Groovy Map containing sample information
e.g. [ id:‘test’, single_end

]

versions (file)

File containing software versions

versions.yml

genes (file)

TSV file containing detected/predicted genes with BGC probability scores. Will not be generated if no hits are found.

*.genes.tsv

features (file)

TSV file containing identified domains

*.features.tsv

clusters (file)

TSV file containing coordinates of predicted clusters and BGC types. Will not be generated if no hits are found.

*.clusters.tsv

gbk (file)

Per cluster GenBank file (if found) containing sequence with annotations. Will not be generated if no hits are found.

*.gbk

json (file)

AntiSMASH v6 sideload JSON file (if —antismash-sideload) supplied. Will not be generated if no hits are found.

*.gbk

Tools

gecco
GPL v3

Biosynthetic Gene Cluster prediction with Conditional Random Fields.