Define where the pipeline should find input data and save output data.

Path to the genome assembly.

required
type: string
pattern: ^\S+\.fn?a(sta)?$

This is the assembly you wish to annotate.

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required
type: string

Email address for completion summary.

type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Path to samplesheet for RNAseq data.

type: string
pattern: ^\S+\.csv$

If you wish to include RNAseq data, you will need to create a samplesheet in CSV format. Use this parameter to specify its location. It has to be a comma-separated file with 4 columns, and a header row.

Path to a fasta file with proteins

type: string
pattern: ^\S+\.fn?a(sta)?$

Specify a fasta-formatted file with proteins from related organisms. Typical sources are Uniprot, EnsEMBL or Refseq.

Path to a fasta file with proteins

type: string
pattern: ^\S+\.fn?a(sta)?$

Specify a fasta-formatted file with proteins your organism of interest. Typical sources are Uniprot, EnsEMBL or Refseq.

Path to a fasta file with transcripts/ESTs

type: string
pattern: ^\S+\.fn?a(sta)?$

Specify a fasta-formatted file with transcripts/ESTs from your organism of interest. Typical sources are ENA and dbEST.

Path to a fasta file with known repeat sequences for this organism

type: string
pattern: ^\S+\.fn?a(sta)?$

Specify a fasta-formatted file with repeat sequences for this organism. Typical sources are databases (NCBI, GRINST) or RepeatModeler.

Path to samplesheet for Reference genomes and annotations.

type: string
pattern: ^\S+\.csv$

If you wish to If you wish to include annotations from related species (lift-over), you will need to create a samplesheet in CSV format. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.

Options that control pipeline behavior

Chunk size for splitting the assembly.

type: integer
default: 200000000

The assembly will split into pieces of this size, in bp, to increase parallelization.

Maximum length of expected introns in bp.

type: integer

This option specifies the longest expected intron in base-pairs. Seeting this too low will result in broken gene models. Conversely, setting this too large may create unreasonable gene models and increase run time.

Minimum size of contig to consider

type: integer
default: 5000

Small contigs will typically not add anything to the annotation, but can increase run time or trigger crashes. This value determines the cutoff for contig inclusion.

Taxonomic group to guide repeat masking.

type: string

Use this taxonomic group or species to identify and mask repeats. Valid names can, in most cases, be guessed, and follow the nomenclature provided through the NCBI taxonomy. This option draws from available data included in DFam 3.2, which contains HMM profiles for over 273.0000 repeat families from 347 species.

A database of curated repeats in EMBL format.

type: string
default: https://www.dfam.org/releases/Dfam_3.5/families/Dfam_curatedonly.h5.gz
pattern: ^\S+\.gz$

This option points to the DFam database (h5 format) of curated repeats for RepeatMasker. By default, the pipeline will get it on-the-fly from the DFam server. You can pre-download the file (.gz) and provide it via this option.

Name of a BUSCO taxonomic group to evaluate the completeness of annotated gene set(s).

type: string

Use this to provide the name of a BUSCO taxonomic group against which to evaluate the resulting gene builds. Format should be taxgroup_odb10 (i.e. without the date).

Path to the local BUSCO data.

type: string

Use this to provide the path to a local copy of the busco database (usually /path/to/busco_downloads). For details, see the BUSCO documentation.

A placeholder gff file to help trigger certain processes.

type: string
default: PIPELINE_BASE/assets/empty.gff3

Options that control gene finding with AUGUSTUS

AUGUSTUS species model to use.

type: string

Specify which model AUGUSTUS will run with. A full list is available here: https://github.com/Gaius-Augustus/Augustus/blob/master/docs/RUNNING-AUGUSTUS.md

Options to pass to AUGUSTUS.

type: string
default: --alternatives-from-evidence=on --minexonintronprob=0.08 --minmeanexonintronprob=0.4 --maxtracks=3

AUGUSTUS has many options that are not specifically available as pipeline options. Instead, you can pass them through this flag.

Location of the AUGUSTUS config directory within the docker container

type: string
default: /usr/local/config

This option specifies where to find the AUGUSTUS config directory inside the Docker container. Normally, you should not change this!

A custom config directory for AUGUSTUS

type: string

Use this to point to a custom AUGUSTUS config directory - for example if you have trained a new model outside of GENOMEANNOTATOR. Most be compatible with AUGUSTUS 3.4.

Custom AUGUSTUS extrinsic config file path

type: string

Provide a custom extrinsic config file to AUGUSTUS, specifying the weight of different types if evidence. We suggest you start with our built-in base version.

Length of annotation chunks in AUGUSTUS

type: integer
default: 3000000

This value determines the length of a region worked on by each AUGUSTUS sub process. The overlap between neighboring chunks is 1/6 the chunk length. The default value should be adequate for most scenarios.

Enable training of a new AUGUSTUS profile.

type: boolean

This option enables training of a new AUGUSTUS prediction profile. You must provide either a full (!) species-specific proteome via --proteins_targeted or a sufficiently comprehensive set of transcripts/RNA-seq data. When both are provided, proteins will be preferred.

Priority for protein-derived hints for gene building.

type: integer
default: 3

This value determines the priority protein-derived hints are given during AUGUSTUS gene finding. The higher the value, the more important the hint (1-5).

Priority for targeted protein evidences

type: integer
default: 5

A value to determine the weight of this type of evidence (1-5). A higher value means this type of evidence is given more consideration.

Priority for transcript evidences

type: integer
default: 4

A value to determine the weight of this type of evidence (1-5). A higher value means this type of evidence is given more consideration.

Priority for RNAseq splice junction evidences

type: integer
default: 4

A value to determine the weight of this type of evidence (1-5). A higher value means this type of evidence is given more consideration.

Priority for RNAseq exon coverage evidences

type: integer
default: 2

A value to determine the weight of this type of evidence (1-5). A higher value means this type of evidence is given more consideration.

Priority for trans-mapped gene model evidences

type: integer
default: 4

A value to determine the weight of this type of evidence (1-5). A higher value means this type of evidence is given more consideration.

Evidence label for transcriptome data

type: string
default: E

A label for a given type of evidence - corresponds to labels in the AUGUSTUS extrinsic config file. Should not be changed.

Evidence label for protein data

type: string
default: P

A label for a given type of evidence - corresponds to labels in the AUGUSTUS extrinsic config file. Should not be changed.

Evidence label for RNAseq data

type: string
default: E

A label for a given type of evidence - corresponds to labels in the AUGUSTUS extrinsic config file. Should not be changed.

Options that control processing of protein evidences

Taxon model to use for SPALN protein alignments.

type: string

This option specifies which SPALN alignment model to use. For a full list of available models, see: https://github.com/ogotoh/spaln/blob/master/table/gnm2tab

SPALN custom options.

type: string
default: -M

USers can pass custom options to the SPALN alignment process. Normally, this will not be necessary!

SPALN id threshold for aligning.

type: integer
default: 60

Users can pass custom id threshold to the SPALN alignment process. Normally, this will not be necessary!

Minimum size of a protein sequence to be included.

type: integer
default: 35

Protein-Databases often contain fragmented protein sequences. Use this option to filter out very small proteins from your evidence set.

Numbe of proteins per alignment job.

type: integer
default: 200

Specifies the number of proteins per alignnment job. This option controls parallelism - the higher this number, the fewer jobs are created and the longer the individual run times. Only increase if you have a very large number of proteins to process. The default value should be fine though.

Q value for the SPALN alignment algorithm.

type: integer
default: 5

ID threshold for targeted protein alignments.

type: integer
default: 90

Options that control the PASA transcriptome annotation pipeline

Number of PASA models to select for AUGUSTUS training.

type: integer
default: 1000

Built-in config file for PASA.

type: string
default: PIPELINE_BASE/assets/pasa/alignAssembly.config

Options that control the EvidenceModeler pipeline

Weights file for EVM.

type: string
default: None

Number of EVM jobs per chunk.

type: integer
default: 10

Options that control individual tool behavior

Activate the trinity assembly sub-pipeline

type: boolean

Assemble short-reads into transcripts using Trinity

Activate the PASA sub-pipeline

type: boolean

Assemble into gene models using PASA.

Activate the EvidenceModeler sub-pipeline

type: boolean

Perform consensus gene building using EvidenceModeler.

Activate search for ncRNAs with RFam/infernal

type: boolean

Perform prediction of non-coding RNAs using CM profiles from Rfam release 14.

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden
type: string
default: master

Base directory for Institutional configs.

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional config name.

hidden
type: string

Institutional config description.

hidden
type: string

Institutional config contact information.

hidden
type: string

Institutional config URL link.

hidden
type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden
type: integer
default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden
type: string
default: 128.GB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden
type: string
default: 240.h
pattern: ^(\d+\.?\s*(s|m|h|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden
type: boolean

Method used to save pipeline results to output directory.

hidden
type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden
type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Send plain-text email instead of HTML.

hidden
type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden
type: string
default: 25.MB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden
type: boolean

Custom config file to supply to MultiQC.

hidden
type: string

Directory to keep pipeline Nextflow logs and reports.

hidden
type: string
default: ${params.outdir}/pipeline_info

Boolean whether to validate parameters against the schema at runtime

hidden
type: boolean
default: true

Show all params when using --help

hidden
type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.

hidden
type: boolean