mag: Parameters

Define where the pipeline should find input data and save output data.

CSV samplesheet file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

Specifies that the input is single-end reads.

type: boolean

Additional input CSV samplesheet containing information about pre-computed assemblies. When set, assembly is skipped and the supplied assemblies are used for downstream analysis.

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Reference genome related files and options required for the workflow.

Do not load the iGenomes reference config.

hidden

type: boolean

The base path to the igenomes reference files

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Display the help message.

type: boolean,string

Display the full detailed help message.

type: boolean

Display hidden parameters in the help message (only works when –help or –help_full are provided).

type: boolean

Use these parameters to also enable reproducible results from the individual assembly and binning tools .

Fix number of CPUs for MEGAHIT to 1. Not increased with retries.

type: boolean

Fix number of CPUs used by SPAdes. Not increased with retries.

type: integer

default: -1

Fix number of CPUs used by SPAdes hybrid. Not increased with retries.

type: integer

default: -1

RNG seed for MetaBAT2.

type: integer

default: 1

Specify which adapter clipping tool to use.

type: string

Specify to save the resulting clipped FASTQ files to –outdir.

type: boolean

The minimum length of reads must have to be retained for downstream analysis.

type: integer

default: 15

Minimum phred quality value of a base to be qualified in fastp.

type: integer

default: 15

The mean quality requirement used for per read sliding window cutting by fastp.

type: integer

default: 15

Save reads that fail fastp filtering in a separate file. Not used downstream.

type: boolean

Turn on detecting and trimming of poly-G tails

type: boolean

The minimum base quality for low-quality base trimming by AdapterRemoval.

type: integer

default: 2

Turn on quality trimming by consecutive stretch of low quality bases, rather than by window.

type: boolean

Forward read adapter to be trimmed by AdapterRemoval.

type: string

default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

pattern: ^[ATGCRYKMSWBDHVN]*$

Reverse read adapter to be trimmed by AdapterRemoval for paired end data.

type: string

default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

pattern: ^[ATGCRYKMSWBDHVN]*$

Name of iGenomes reference for host contamination removal.

type: string

Fasta reference file for host contamination removal.

type: string

Bowtie2 index directory corresponding to --host_fasta reference file for host contamination removal.

type: string

Use the --very-sensitive instead of the--sensitivesetting for Bowtie 2 to map reads against the host genome.

type: boolean

Save the read IDs of removed host reads.

type: boolean

Specify to save input FASTQ files with host reads removed to –outdir.

type: boolean

Keep reads similar to the Illumina internal standard PhiX genome.

type: boolean

Genome reference used to remove Illumina PhiX contaminant reads.

type: string

Skip read preprocessing using fastp or adapterremoval.

type: boolean

Skip all default QC steps for short reads (adapter trimming, phiX removal).

type: boolean

Skip running FastQC on short reads both before and after QC.

type: boolean

Specify to save input FASTQ files with phiX reads removed to –outdir.

type: boolean

Run BBnorm to normalize sequence depth.

type: boolean

Set BBnorm target maximum depth to this number.

type: integer

default: 100

Set BBnorm minimum depth to this number.

type: integer

default: 5

Save normalized read files to output directory.

type: boolean

Skip removing adapter sequences from long reads.

type: boolean

Skip filtering long reads.

type: boolean

Skip all default QC steps for long reads (adapter trimming, filtering, removal of lambda sequences).

type: boolean

Discard any read which is shorter than this value.

type: integer

default: 1000

Discard any read which has a mean quality score lower than this value.

type: integer

Keep this percent of bases. Only used by filtlong.

type: integer

default: 90

The higher the more important is read length when choosing the best reads. Only used by filtlong.

type: integer

default: 10

Keep reads similar to the ONT internal standard Escherichia virus Lambda genome.

type: boolean

Genome reference used to remove ONT Lambda contaminant reads.

type: string

Specify to save input FASTQ files with lamba reads removed to –outdir.

type: boolean

Specify to save the resulting clipped FASTQ files to –outdir.

type: boolean

Specify to save the resulting length filtered long read FASTQ files to –outdir.

type: boolean

Specify which long read adapter trimming tool to use.

type: string

Specify which long read filtering tool to use.

type: string

Filter long reads against short reads when using filtlong.

type: boolean

Taxonomic classification is disabled by default. You have to specify one of the options below to activate it.

Database for taxonomic classification of metagenome assembled genomes. Can be either a zipped file or a directory containing the extracted output of such.

type: string

Generate CAT database.

type: boolean

Save the CAT database generated when specified by --cat_db_generate.

type: boolean

Allow unofficial lineages in CAT classification.

type: boolean

Classify unbinned contigs with CAT (contig mode).

type: boolean

Specify to turn off CAT marking in output files most probable hits (when multiple) with an asterix.

type: boolean

Skip the running of GTDB, as well as the automatic download of the database

type: boolean

Specify the location of a GTDBTK database. Can be either an uncompressed directory or a .tar.gz archive. If not specified will be downloaded for you when GTDBTK or binning QC is not skipped.

type: string

default:

https://data.gtdb.aau.ecogenomic.org/releases/release232/232.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r232_data.tar.gz

Min. bin completeness (in %) required to apply GTDB-tk classification.

type: number

default: 50

Max. bin contamination (in %) allowed to apply GTDB-tk classification.

type: number

default: 10

Min. fraction of AA (in %) in the MSA for bins to be kept.

type: number

default: 10

Min. alignment fraction to consider closest genome.

type: number

default: 0.65

Number of CPUs used for the by GTDB-Tk run tool pplacer.

type: integer

default: 1

Speed up pplacer step of GTDB-Tk by loading to memory.

type: boolean

Specify to have GTDBTk to use the full bacterial tree rather than the split tree (requires more memory!)

type: boolean

Specify to disable fast classification of genomes by ANI using skani in GTDB-Tk.

type: boolean

[DEPRECATED] Use --gtdbtk_place_species instead. Specify to disable fast classification of genomes by ANI using skani in GTDB-Tk.

hidden

type: boolean

Run GTDB-Tk classification for all bins in a single job, rather than one job per sample/assembler/binner group.

type: boolean

Co-assemble samples within one group, instead of assembling each sample separately.

type: boolean

Additional custom options for SPAdes and SPAdesHybrid. Do not specify --meta as this will be added for you!

type: string

Specify whether to use contigs or scaffolds assembled by SPAdes

type: string

Additional custom options for MEGAHIT.

type: string

Skip Illumina-only SPAdes assembly.

type: boolean

Skip SPAdes hybrid assembly.

type: boolean

Skip MEGAHIT assembly.

type: boolean

Skip ALE

type: boolean

Enable ALE per-base output. This output can be very large (tens of GB).

type: boolean

Skip metaQUAST.

type: boolean

Skip MetaDBG assembly.

type: boolean

Skip Flye assembly.

type: boolean

Run PyPOLCA polishing on long-read assemblies before binning.

type: boolean

Skip Prodigal gene prediction

type: boolean

Turn on Prokka complicance mode for truncating contig names for NCBI/ENA compatibility.

type: boolean

Specify sequencing centre name required for Prokka’s compliance mode.

type: string

Skip Prokka genome annotation.

type: boolean

Specify to skip CDS/product searching in Prokka runs

type: boolean

[DEPRECATED] This parameter no longer has any effect and will be removed in a future release. MetaEuk only runs when --metaeuk_db or --metaeuk_mmseqs_db is supplied.

type: boolean

A string containing the name of one of the databases listed in the mmseqs2 documentation. This database will be downloaded and formatted for eukaryotic genome annotation. Incompatible with –metaeuk_db.

type: string

Path to either a local fasta file of protein sequences, or to a directory containing an MMseqs2-formatted database, for annotation of eukaryotic genomes.

type: string

Save the downloaded mmseqs2 database specified in --metaeuk_mmseqs_db.

type: boolean

Run virus identification.

type: boolean

Database for virus classification with geNomad

type: string

Minimum geNomad score for a sequence to be considered viral

type: number

default: 0.7

Number of groups that geNomad’s MMSeqs2 databse should be split into (reduced memory requirements)

type: integer

default: 1

Defines mapping strategy to compute co-abundances for binning, i.e. which samples will be mapped against the assembly.

type: string

Skip metagenome binning entirely

type: boolean

Skip MetaBAT2 Binning

type: boolean

Skip MaxBin2 Binning

type: boolean

Skip CONCOCT Binning

type: boolean

Skip COMEBin Binning

type: boolean

Skip MetaBinner Binning

type: boolean

Dataset scale for MetaBinner

type: string

Skip SemiBin2 Binning

type: boolean

RNG seed for SemiBin2.

type: integer

default: 1

Pre-trained model for SemiBin2 for single sample assemblies

type: string

Minimum contig size to be considered for binning and for bin quality check.

type: integer

default: 1500

Minimal length of contigs that are not part of any bin but treated as individual genome.

type: integer

default: 1000000

Maximal number of contigs that are not part of any bin but treated as individual genome.

type: integer

default: 100

Specify the shortest length a bin should be to retain for downstream processing (in base pairs)

type: integer

Specify the longest length a bin should be to retain for downstream processing (in base pairs). By default no limit.

type: integer

Limit the number of concurrent SEQKIT_STATS jobs used for bin size calculation.

type: integer

Specify length of sub-contigs cut up prior CONCOCT binning

type: integer

default: 10000

Specify the overlap between each sub-contig prior CONCOCT binning

type: integer

Specify to not append the last contig less than sub-contig length to the last correct length contig

type: boolean

Specify alternative Bowtie2 settings for aligning reads back against the assembly.

type: string

pattern: ^[-\w]*$

Save the output of mapping raw reads back to assembled contigs

type: boolean

Enable domain-level (prokaryote or eukaryote) classification of bins using Tiara. Processes which are domain-specific will then only receive bins matching the domain requirement.

type: boolean

Specify which tool to use for domain classification of bins. Currently only ‘tiara’ is implemented.

hidden

type: string

default: tiara

Minimum contig length for Tiara to use for domain classification. For accurate classification, should be longer than 3000 bp.

type: integer

default: 3000

Exclude unbinned contigs in the post-binning steps (bin QC, taxonomic classification, and annotation steps).

type: boolean

Specify a minimum percent identity filter for long reads mapping back to assembled contigs.

type: number

Specify a minimum percent identity filter for short reads mapping back to assembled contigs.

type: number

Disable bin QC with BUSCO, CheckM or CheckM2.

type: boolean

Enable running BUSCO during bin QC.

type: boolean

default: true

Enable running CheckM during bin QC.

type: boolean

Enable running CheckM2 during bin QC.

type: boolean

Download URL, local tar.gz archive, or local uncompressed directory for an *_odb10 or *_odb12 BUSCO lineage dataset.

type: string

Name of the BUSCO *_odb10 or *_odb12 lineage to check against. Additionally supports ‘auto’, ‘auto_prok’ and ‘auto_euk’ for automatic lineage selection mode.

type: string

default: auto

pattern: (.*_odb(10|12))|auto(_prok|_euk)?$

Save the used BUSCO lineage datasets provided via --busco_db.

type: boolean

Enable clean-up of temporary files created during BUSCO runs.

type: boolean

URL pointing to checkM database for auto download, if local path not supplied.

hidden

type: string

default: https://zenodo.org/records/7401545/files/checkm_data_2015_01_16.tar.gz

Path to local folder containing already downloaded and uncompressed CheckM database.

type: string

Save the used CheckM reference files downloaded when not using –checkm_db parameter.

type: boolean

Path to local file of an already downloaded and uncompressed CheckM2 database file (.dmnd file).

type: string

CheckM2 database version number to download (Zenodo record ID, for reference check the canonical reference https://zenodo.org/records/5571251, and pick the Zenodo ID of the database version of your choice).

type: integer

default: 14897628

Save the used CheckM2 reference files downloaded when not using –checkm2_db parameter.

type: boolean

Turn on bin refinement using DAS Tool.

type: boolean

Specify single-copy gene score threshold for bin refinement.

type: number

default: 0.5

Specify to save contig to bin maps used for bin refinement

type: boolean

Specify which binning output is sent for downstream annotation, taxonomic classification, bin quality control etc.

type: string

Turn on GUNC genome chimerism checks

type: boolean

Specify a path to a pre-downloaded GUNC dmnd database file

type: string

Specify which database to auto-download if not supplying own

type: string

Save the used GUNC reference files downloaded when not using –gunc_db parameter.

type: boolean

Make a BIgMAG input file including GUNC results.

type: boolean

Performs ancient DNA assembly validation and contig consensus sequence recalling.

Turn on/off the ancient DNA subworkflow

type: boolean

PyDamage accuracy threshold

type: number

default: 0.5

deactivate damage correction of ancient contigs using variant and consensus calling

type: boolean

Ploidy for variant calling

type: integer

default: 1

minimum base quality required for variant calling

type: integer

default: 20

minimum minor allele frequency for considering variants

type: number

default: 0.33

minimum genotype quality for considering a variant high quality

type: integer

default: 30

minimum genotype quality for considering a variant medium quality

type: integer

default: 20

minimum number of bases supporting the alternative allele

type: integer

default: 3

nf-core/mag