Description

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

Input

name:type
description
pattern

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

input:file

A genomic file containing one or more sequences as input. Input type is any supported by Biopython (fasta, gbk, etc.)

*

hmm:file

Alternative HMM file(s) to use in HMMER format

*.hmm

model_dir:directory

Path to an alternative CRF (Conditional Random Fields) module to use

Output

name:type
description
pattern

genes

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.genes.tsv:file

TSV file containing detected/predicted genes with BGC probability scores. Will not be generated if no hits are found.

*.genes.tsv

features

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.features.tsv:file

TSV file containing identified domains

*.features.tsv

clusters

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.clusters.tsv:file

TSV file containing coordinates of predicted clusters and BGC types. Will not be generated if no hits are found.

*.clusters.tsv

gbk

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*_cluster_*.gbk:file

Per cluster GenBank file (if found) containing sequence with annotations. Will not be generated if no hits are found.

*.gbk

json

meta:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end

]

*.json:file

AntiSMASH v6 sideload JSON file (if —antismash-sideload) supplied. Will not be generated if no hits are found.

*.gbk

versions

versions.yml:file

File containing software versions

versions.yml

Tools

gecco
GPL v3

Biosynthetic Gene Cluster prediction with Conditional Random Fields.