Description
GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
Input
A genomic file containing one or more sequences as input. Input type is any supported by Biopython (fasta, gbk, etc.)
*
Output
TSV file containing detected/predicted genes with BGC probability scores. Will not be generated if no hits are found.
*.genes.tsv
TSV file containing coordinates of predicted clusters and BGC types. Will not be generated if no hits are found.
*.clusters.tsv
Per cluster GenBank file (if found) containing sequence with annotations. Will not be generated if no hits are found.
*.gbk