Description

Cluster sequences using a single-pass, greedy centroid-based clustering algorithm.

Input

name:type
description
pattern

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

fasta:file

Sequences to cluster in FASTA format

*.{fasta,fa,fasta.gz,fa.gz}

Output

name:type
description
pattern

aln

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.aln.gz:file

Results in pairwise alignment format

*.aln.gz

biom

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.biom.gz:file

Results in an OTU table in the biom version 1.0 file format

*.biom.gz

mothur

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.mothur.tsv.gz:file

Results in an OTU table in the mothur ’shared’ tab-separated plain text file format

*.mothur.tsv.gz

otu

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.otu.tsv.gz:file

Results in an OTU table in the classic tab-separated plain text format

*.otu.tsv.gz

bam

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.bam:file

Results written in bam format

*.bam

out

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.out.tsv.gz:file

Results in tab-separated output, columns defined by user

*.out.tsv.gz

blast

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.blast.tsv.gz:file

Tab delimited results in blast-like tabular format

*.blast.tsv.gz

uc

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.uc.tsv.gz:file

Tab delimited results in a uclust-like format with 10 columns

*.uc.gz

centroids

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.centroids.fasta.gz:file

Centroid sequences in FASTA format

*.centroids.fasta.gz

clusters

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.clusters.fasta*.gz:file

Clustered sequences in FASTA format

*.clusters.fasta*.gz

profile

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.profile.txt.gz:file

Profile of the clustering results

*.profile.txt.gz

msa

meta:map

Groovy Map containing sample information e.g. [ id:‘test’ ]

*.msa.fasta.gz:file

Multiple sequence alignment of the centroids

*.msa.fasta.gz

versions

versions.yml:file

File containing software versions

versions.yml

Tools

vsearch
GPL v3-or-later OR BSD-2-clause

VSEARCH is a versatile open-source tool for microbiome analysis, including chimera detection, clustering, dereplication and rereplication, extraction, FASTA/FASTQ/SFF file processing, masking, orienting, pair-wise alignment, restriction site cutting, searching, shuffling, sorting, subsampling, and taxonomic classification of amplicon sequences for metagenomics, genomics, and population genetics. (USEARCH alternative)