Description

GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

Input

name:type
description
pattern

meta{:bash}

:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end:false ]

input{:bash}

:file

A genomic file containing one or more sequences as input. Input type is any supported by Biopython (fasta, gbk, etc.)

*

hmm{:bash}

:file

Alternative HMM file(s) to use in HMMER format

*.hmm

model_dir{:bash}

:directory

Path to an alternative CRF (Conditional Random Fields) module to use

Output

name:type
description
pattern

genes{:bash}

meta{:bash}

:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end:false ]

*.genes.tsv{:bash}

:file

TSV file containing detected/predicted genes with BGC probability scores. Will not be generated if no hits are found.

*.genes.tsv

features{:bash}

meta{:bash}

:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end:false ]

*.features.tsv{:bash}

:file

TSV file containing identified domains

*.features.tsv

clusters{:bash}

meta{:bash}

:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end:false ]

*.clusters.tsv{:bash}

:file

TSV file containing coordinates of predicted clusters and BGC types. Will not be generated if no hits are found.

*.clusters.tsv

gbk{:bash}

meta{:bash}

:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end:false ]

*_cluster_*.gbk{:bash}

:file

Per cluster GenBank file (if found) containing sequence with annotations. Will not be generated if no hits are found.

*.gbk

json{:bash}

meta{:bash}

:map

Groovy Map containing sample information e.g. [ id:‘test’, single_end:false ]

*.json{:bash}

:file

AntiSMASH v6 sideload JSON file (if —antismash-sideload) supplied. Will not be generated if no hits are found.

*.gbk

versions{:bash}

versions.yml{:bash}

:file

File containing software versions

versions.yml

Tools

gecco
GPL v3

Biosynthetic Gene Cluster prediction with Conditional Random Fields.