A tool to quickly download assemblies from NCBI’s Assembly database
Input
name:type
description
pattern
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
accessions:file
List of accessions (one per line) to download
*.txt
taxids:file
List of taxids (one per line) to download
*.txt
groups:string
NCBI taxonomic groups to download. Can be a comma-separated list. Options are [‘all’, ‘archaea’, ‘bacteria’, ‘fungi’, ‘invertebrate’, ‘metagenomes’, ‘plant’, ‘protozoa’, ‘vertebrate_mammalian’, ‘vertebrate_other’, ‘viral’]
Output
name:type
description
pattern
gbk
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_genomic.gbff.gz:file
GenBank format of the genomic sequence(s) in the assembly
*_genomic.gbff.gz
fna
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_genomic.fna.gz:file
FASTA format of the genomic sequence(s) in the assembly.
*_genomic.fna.gz
rm
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_rm.out.gz:file
RepeatMasker output for eukaryotes.
*_rm.out.gz
features
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_feature_table.txt.gz:file
Tab-delimited text file reporting locations and attributes for a subset of annotated features
*_feature_table.txt.gz
gff
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_genomic.gff.gz:file
Annotation of the genomic sequence(s) in GFF3 format
*_genomic.gff.gz
faa
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_protein.faa.gz:file
FASTA format of the accessioned protein products annotated on the genome assembly.
*_protein.faa.gz
gpff
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_protein.gpff.gz:file
GenPept format of the accessioned protein products annotated on the genome assembly.
*_protein.gpff.gz
wgs_gbk
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_wgsmaster.gbff.gz:file
GenBank flat file format of the WGS master for the assembly
*_wgsmaster.gbff.gz
cds
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_cds_from_genomic.fna.gz:file
FASTA format of the nucleotide sequences corresponding to all CDS features annotated on the assembly
*_cds_from_genomic.fna.gz
rna
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_rna.fna.gz:file
FASTA format of accessioned RNA products annotated on the genome assembly
*_rna.fna.gz
rna_fna
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_rna_from_genomic.fna.gz:file
FASTA format of the nucleotide sequences corresponding to all RNA features annotated on the assembly
*_rna_from_genomic.fna.gz
report
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_assembly_report.txt:file
Tab-delimited text file reporting the name, role and sequence accession.version for objects in the assembly
*_assembly_report.txt
stats
meta:map
Groovy Map containing sample information
e.g. [ id:‘test’, single_end:false ]
*_assembly_stats.txt:file
Tab-delimited text file reporting statistics for the assembly