nf-core/ncrnannotator
nf-core pipeline for genome-level ncRNA annotation using Infernal
Introduction
ncrnannotator annotates non-coding RNA in genome assemblies using Infernal and the Rfam database. It produces annotation files in GTF, GFF3, and BED format.
Quick start
nextflow run nf-core/ncrnannotator \
--fasta genome.fa \
--mode ensembl-vertebrates \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile dockerRequired inputs
| Parameter | Description |
|---|---|
--fasta | Genome assembly in FASTA format (uncompressed or .gz) |
--mode | Annotation mode (see below) |
--rfam_cm | Rfam covariance model file (Rfam.cm) |
--rfam_seed | Rfam seed alignment file (Rfam.seed) |
--outdir | Output directory |
Obtaining Rfam files
Download the latest Rfam release from the Rfam FTP:
wget https://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz
wget https://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.seed.gz
gunzip Rfam.cm.gz Rfam.seed.gzAnnotation modes
ensembl-vertebrates
Annotates vertebrate genomes using a curated subset of Rfam families (snRNA, snoRNA, rRNA, SRP RNA, Y RNA, RNase P, Vault RNA). Uses 1 Mbp genome chunks.
nextflow run nf-core/ncrnannotator \
--fasta Homo_sapiens.GRCh38.fa \
--mode ensembl-vertebrates \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile dockerensembl-invertebrates
Annotates invertebrate genomes using a curated subset of Rfam families. Uses 100 kbp genome chunks for higher sensitivity on compact genomes.
nextflow run nf-core/ncrnannotator \
--fasta Caenorhabditis_elegans.WBcel235.fa \
--mode ensembl-invertebrates \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile dockermgnify-assembly
Annotates metagenomic assemblies using the full Rfam database (no clade filtering). Includes prokaryotic rRNA (bacterial, archaeal). Uses 50 Mbp genome chunks.
nextflow run nf-core/ncrnannotator \
--fasta metagenome_assembly.fa \
--mode mgnify-assembly \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile dockerfull
Annotates any genome using the complete Rfam database (no clade filtering) with standard Infernal covariance model scoring. Eukaryotic output only (prokaryotic rRNA excluded). Uses 1 Mbp genome chunks.
nextflow run nf-core/ncrnannotator \
--fasta genome.fa \
--mode full \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile dockerOptional parameters
| Parameter | Description | Default |
|---|---|---|
--chunk_size | Override genome chunk size (bp) | Mode-dependent |
--rfam_accessions | Custom Rfam accession list (overrides bundled lists) | Bundled per mode |
Custom chunk size
For very large genomes or limited memory, override the chunk size:
# Smaller chunks use less memory per cmsearch job
--chunk_size 500000Running on a laptop
When running locally, use a custom config to cap resource usage:
nextflow run nf-core/ncrnannotator \
--fasta genome.fa \
--mode ensembl-invertebrates \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-c conf/local.config \
-profile condaThe bundled conf/local.config limits memory to 20 GB and 8 CPUs. Edit it to match your machine.
Using a params file
Rather than specifying all parameters on the command line, you can use a YAML params file:
nextflow run nf-core/ncrnannotator -params-file params.yaml -profile dockerAn example params file is provided at assets/params_example.yaml.
Do not use -c <file> to specify pipeline parameters — only use -c for resource tuning. Use -params-file for parameter overrides.
Resuming a run
Add -resume to restart from the last successful step:
nextflow run nf-core/ncrnannotator \
--fasta genome.fa \
--mode ensembl-vertebrates \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile docker \
-resumeRunning on an HPC cluster
SLURM example
nextflow run nf-core/ncrnannotator \
--fasta genome.fa \
--mode ensembl-vertebrates \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile singularity \
-resumeRun Nextflow itself in a SLURM job to avoid timeouts on the head node:
#!/bin/bash
#SBATCH --job-name=ncrnannotator
#SBATCH --time=24:00:00
#SBATCH --mem=8G
#SBATCH --cpus-per-task=2
nextflow run nf-core/ncrnannotator \
-params-file params.yaml \
-profile singularity \
-resumeReproducibility
Pin the pipeline version with -r:
nextflow run nf-core/ncrnannotator -r 1.0.0 \
--fasta genome.fa \
--mode ensembl-vertebrates \
--rfam_cm Rfam.cm \
--rfam_seed Rfam.seed \
--outdir results \
-profile dockerCore Nextflow arguments
-profile
Select a software environment profile:
docker— Docker containers (recommended)singularity— Singularity containers (recommended for HPC)conda— Conda environments (fallback when containers unavailable)podman,apptainer,charliecloud— Alternative container runtimestest— Minimal test run (no input files needed)
Example with multiple profiles:
-profile test,docker-resume
Restart from cached intermediate results. See the Nextflow resume docs.
-c
Provide a custom config for resource tuning only:
-c conf/local.configNextflow memory requirements
Limit Nextflow’s own JVM memory usage (add to ~/.bashrc):
export NXF_OPTS='-Xms1g -Xmx4g'