mag: Parameters

Define where the pipeline should find input data and save output data.

CSV samplesheet file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

Specifies that the input is single-end reads.

type: boolean

Additional input CSV samplesheet containing information about pre-computed assemblies. When set, both read pre-processing and assembly are skipped and the pipeline begins at the binning stage.

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Reference genome related files and options required for the workflow.

Directory / URL base for iGenomes references.

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Do not load the iGenomes reference config.

hidden

type: boolean

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+\.?\s*(s|m|h|d|day)\s*)+$

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Show all params when using --help

hidden

type: boolean

Validation of parameters fails when an unrecognised parameter is found.

hidden

type: boolean

Validation of parameters in lenient more.

hidden

type: boolean

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Use these parameters to also enable reproducible results from the individual assembly and binning tools .

Fix number of CPUs for MEGAHIT to 1. Not increased with retries.

type: boolean

Fix number of CPUs used by SPAdes. Not increased with retries.

type: integer

default: -1

Fix number of CPUs used by SPAdes hybrid. Not increased with retries.

type: integer

default: -1

RNG seed for MetaBAT2.

type: integer

default: 1

Specify which adapter clipping tool to use.

type: string

Specify to save the resulting clipped FASTQ files to –outdir.

type: boolean

The minimum length of reads must have to be retained for downstream analysis.

type: integer

default: 15

Minimum phred quality value of a base to be qualified in fastp.

type: integer

default: 15

The mean quality requirement used for per read sliding window cutting by fastp.

type: integer

default: 15

Save reads that fail fastp filtering in a separate file. Not used downstream.

type: boolean

The minimum base quality for low-quality base trimming by AdapterRemoval.

type: integer

default: 2

Turn on quality trimming by consecutive stretch of low quality bases, rather than by window.

type: boolean

Forward read adapter to be trimmed by AdapterRemoval.

type: string

default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

Reverse read adapter to be trimmed by AdapterRemoval for paired end data.

type: string

default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

Name of iGenomes reference for host contamination removal.

type: string

Fasta reference file for host contamination removal.

type: string

Use the --very-sensitive instead of the--sensitivesetting for Bowtie 2 to map reads against the host genome.

type: boolean

Save the read IDs of removed host reads.

type: boolean

Specify to save input FASTQ files with host reads removed to –outdir.

type: boolean

Keep reads similar to the Illumina internal standard PhiX genome.

type: boolean

Genome reference used to remove Illumina PhiX contaminant reads.

hidden

type: string

default: ${baseDir}/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz

Skip read preprocessing using fastp or adapterremoval.

type: boolean

Specify to save input FASTQ files with phiX reads removed to –outdir.

type: boolean

Run BBnorm to normalize sequence depth.

type: boolean

Set BBnorm target maximum depth to this number.

type: integer

default: 100

Set BBnorm minimum depth to this number.

type: integer

default: 5

Save normalized read files to output directory.

type: boolean

Skip removing adapter sequences from long reads.

type: boolean

Discard any read which is shorter than this value.

type: integer

default: 1000

Keep this percent of bases.

type: integer

default: 90

The higher the more important is read length when choosing the best reads.

type: integer

default: 10

Keep reads similar to the ONT internal standard Escherichia virus Lambda genome.

type: boolean

Genome reference used to remove ONT Lambda contaminant reads.

hidden

type: string

default: ${baseDir}/assets/data/GCA_000840245.1_ViralProj14204_genomic.fna.gz

Specify to save input FASTQ files with lamba reads removed to –outdir.

type: boolean

Specify to save the resulting clipped FASTQ files to –outdir.

type: boolean

Specify to save the resulting length filtered FASTQ files to –outdir.

type: boolean

Taxonomic classification is disabled by default. You have to specify one of the options below to activate it.

Database for taxonomic binning with centrifuge.

type: string

Database for taxonomic binning with kraken2.

type: string

Database for taxonomic binning with krona

type: string

Skip creating a krona plot for taxonomic binning.

type: boolean

Database for taxonomic classification of metagenome assembled genomes. Can be either a zipped file or a directory containing the extracted output of such.

type: string

Generate CAT database.

type: boolean

Save the CAT database generated when specified by --cat_db_generate.

type: boolean

Only return official taxonomic ranks (Kingdom, Phylum, etc.) when running CAT.

type: boolean

Skip the running of GTDB, as well as the automatic download of the database

type: boolean

Specify the location of a GTDBTK database. Can be either an uncompressed directory or a .tar.gz archive. If not specified will be downloaded for you when GTDBTK or binning QC is not skipped.

type: string

default: https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.1/auxillary_files/gtdbtk_r214_data.tar.gz

Specify the location of a GTDBTK mash database. If missing, GTDB-Tk will skip the ani_screening step

type: string

Min. bin completeness (in %) required to apply GTDB-tk classification.

type: number

default: 50

Max. bin contamination (in %) allowed to apply GTDB-tk classification.

type: number

default: 10

Min. fraction of AA (in %) in the MSA for bins to be kept.

type: number

default: 10

Min. alignment fraction to consider closest genome.

type: number

default: 0.65

Number of CPUs used for the by GTDB-Tk run tool pplacer.

type: number

default: 1

Reduce GTDB-Tk memory consumption by running pplacer in a setting writing to disk.

type: boolean

default: true

Database for virus classification with geNomad

type: string

Co-assemble samples within one group, instead of assembling each sample separately.

type: boolean

Additional custom options for SPAdes.

type: string

Additional custom options for MEGAHIT.

type: string

Skip Illumina-only SPAdes assembly.

type: boolean

Skip SPAdes hybrid assembly.

type: boolean

Skip MEGAHIT assembly.

type: boolean

Skip metaQUAST.

type: boolean

Skip Prodigal gene prediction

type: boolean

Skip Prokka genome annotation.

type: boolean

Skip MetaEuk gene prediction and annotation

type: boolean

A string containing the name of one of the databases listed in the mmseqs2 documentation. This database will be downloaded and formatted for eukaryotic genome annotation. Incompatible with –metaeuk_db.

type: string

Path to either a local fasta file of protein sequences, or to a directory containing an mmseqs2-formatted database, for annotation of eukaryotic genomes.

type: string

Save the downloaded mmseqs2 database specified in --metaeuk_mmseqs_db.

type: boolean

Run virus identification.

type: boolean

Minimum geNomad score for a sequence to be considered viral

type: number

default: 0.7

Number of groups that geNomad’s MMSeqs2 databse should be split into (reduced memory requirements)

type: integer

default: 1

Defines mapping strategy to compute co-abundances for binning, i.e. which samples will be mapped against the assembly.

type: string

Skip metagenome binning entirely

type: boolean

Skip MetaBAT2 Binning

type: boolean

Skip MaxBin2 Binning

type: boolean

Skip CONCOCT Binning

type: boolean

Minimum contig size to be considered for binning and for bin quality check.

type: integer

default: 1500

Minimal length of contigs that are not part of any bin but treated as individual genome.

type: integer

default: 1000000

Maximal number of contigs that are not part of any bin but treated as individual genome.

type: integer

default: 100

Bowtie2 alignment mode

type: string

Save the output of mapping raw reads back to assembled contigs

type: boolean

Enable domain-level (prokaryote or eukaryote) classification of bins using Tiara. Processes which are domain-specific will then only receive bins matching the domain requirement.

type: boolean

Specify which tool to use for domain classification of bins. Currently only ‘tiara’ is implemented.

hidden

type: string

default: tiara

Minimum contig length for Tiara to use for domain classification. For accurate classification, should be longer than 3000 bp.

type: integer

default: 3000

Disable bin QC with BUSCO or CheckM.

type: boolean

Specify which tool for bin quality-control validation to use.

type: string

Download URL for BUSCO lineage dataset, or path to a tar.gz archive, or local directory containing already downloaded and unpacked lineage datasets.

type: string

Run BUSCO with automated lineage selection, but ignoring eukaryotes (saves runtime).

type: boolean

Save the used BUSCO lineage datasets provided via --busco_db.

type: boolean

Enable clean-up of temporary files created during BUSCO runs.

type: boolean

URL pointing to checkM database for auto download, if local path not supplied.

hidden

type: string

default: https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz

Path to local folder containing already downloaded and uncompressed CheckM database.

type: string

Save the used CheckM reference files downloaded when not using –checkm_db parameter.

type: boolean

Turn on bin refinement using DAS Tool.

type: boolean

Specify single-copy gene score threshold for bin refinement.

type: number

default: 0.5

Specify which binning output is sent for downstream annotation, taxonomic classification, bin quality control etc.

type: string

Turn on GUNC genome chimerism checks

type: boolean

Specify a path to a pre-downloaded GUNC dmnd database file

type: string

Specify which database to auto-download if not supplying own

type: string

Save the used GUNC reference files downloaded when not using –gunc_db parameter.

type: boolean

Performs ancient DNA assembly validation and contig consensus sequence recalling.

Turn on/off the ancient DNA subworfklow

type: boolean

PyDamage accuracy threshold

type: number

default: 0.5

deactivate damage correction of ancient contigs using variant and consensus calling

type: boolean

Ploidy for variant calling

type: integer

default: 1

minimum base quality required for variant calling

type: integer

default: 20

minimum minor allele frequency for considering variants

type: number

default: 0.33

minimum genotype quality for considering a variant high quality

type: integer

default: 30

minimum genotype quality for considering a variant medium quality

type: integer

default: 20

minimum number of bases supporting the alternative allele

type: integer

default: 3

nf-core/mag