viralmetagenome: Parameters

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Sample metadata that is included in the multiqc report

type: string

pattern: ^\S+\.[tc]sv$

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Options related to the trimming, low complexity and host removal steps of the reads

Skip read preprocessing and use input reads for downstream analysis

type: boolean

Skip read quality statistics summary tool ‘fastqc’

type: boolean

Save reads after the final preprocessing step

type: boolean

default: true

Save reads after every preprocessing step

type: boolean

With or without umi detection

type: boolean

With or without umi extraction

type: boolean

default: true

Specify at what level UMI deduplication should occur.

type: string

Specify the maximum number of mismatches between reads for them to still be considered neighbors.

type: integer

default: 1

Specify the strategy for umi-deduplication directional vs cluster

type: string

Specify the strategy or method for umi-tools deduplication on mapping level

type: string

Discard R1 / R2 if required 0, meaning not to discard

type: integer

The used trimming tool

type: string

Skip read trimming

type: boolean

Use Fastp’s deduplicate option

type: boolean

Define the accuracy used for hashes while deduplicating with faspt

type: number

Fasta file of adapters

type: string

Specify true to save files that failed to pass trimming thresholds ending in *.fail.fastq.gz

type: boolean

Specify true to save all merged reads to the a file ending in *.merged.fastq.gz

type: boolean

Inputs with fewer than this reads will be filtered out of the “reads” output channel

type: integer

default: 1

Skip filtering of low complexity regions in reads

type: boolean

default: true

Reference files containing adapter and/or contaminant sequences for sequence kmer matching (used by bbduk)

type: string

Skip the removal of host read sequences

type: boolean

Kraken2 database used to remove host and conamination

type: string

default: s3://ngi-igenomes/test-data/viralrecon/kraken2_human.tar.gz

Kraken2 library(s) required to remove host and contamination

type: string

default: human

Skip the fastqc step after host & contaminants were removed

type: boolean

Parameters used to determine the metagenomic diversity of the sample

Skip determining the metagenomic diversity of the sample

type: boolean

Specify the taxonomic read classifiers, choices are ‘kaiju,kraken2’

type: string

default: kraken2,kaiju

pattern: ^(kaiju|kraken2|bracken)(?:,(kaiju|kraken2|bracken)){0,2}$

Save the used databases

type: boolean

Location of the Kraken2 database

type: string

default: https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20230314.tar.gz

Save classified and unclassified reads as fastq files

type: boolean

Save summary overview of read classifications in a txt file

type: boolean

Save kraken2’s used minimizers

type: boolean

Location of bracken database

type: string

default: https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20230314.tar.gz

Location of Kaiju database

type: string

default: https://kaiju-idx.s3.eu-central-1.amazonaws.com/2023/kaiju_db_rvdb_2023-05-26.tgz

Level of taxa rank that needs to be determined

type: string

Parameters relating to the used assembly methods

Skip de novo assembly of reads

type: boolean

The specified tools for denovo assembly, multiple options are possible

type: string

default: spades,trinity,megahit

pattern: ^(trinity|spades|megahit)(?:,(trinity|spades|megahit)){0,2}$

specific SPAdes mode to run

type: string

File or directory with amino acid HMMs for Spades HMM-guided mode.

type: string

Path to yml file containing read information.

hidden

type: string

Regex pattern to identify contigs that have been made by the assemblers

type: string

Parameters relating to the refinement of denovo contigs

Skip the refinement/polishing of contigs through reference based scaffolding and read mapping

type: boolean

Save intermediate polishing files

type: boolean

Set of fasta sequences used as potential references for the contigs

type: string

default: https://rvdb.dbi.udel.edu/download/C-RVDBvCurrent.fasta.gz

Skip the preclustering of assemblies to facilitate downstream processing of assemblies

type: boolean

Keep the contigs that could not be classified with the taxonomic databases (kaiju_db & kraken2_db)

type: boolean

default: true

Specify the metagenomic classifiers to use for contig taxonomy classification: ‘kraken2,kaiju’

type: string

default: kraken2,kaiju

pattern: ^(kaiju|kraken2)(,(kaiju|kraken2))?$

Taxon conflict resolution mode, must be 1 (Kaiju), 2 (Kraken), lca, or lowest.

type: string

Level of taxonomic simplification

type: string

Hard constrain for taxa to exclude from the preclustering, if multiple given make sure to enclose with ’”’ and separate with a space.

type: string

pattern: ^[\d +]+$

Taxon ids to exclude along with all their children from the preclustering, if multiple given make sure to enclose with ’”’ and separate with a space.

type: string

pattern: ^[\d +]+$

Taxon ids to exclude along with all their parents from the preclustering, if multiple given make sure to enclose with ’”’ and separate with a space.

type: string

pattern: ^[\d +]+$

Taxon ids to include along with all their children from the preclustering, if multiple given make sure to enclose with ’”’ and separate with a space.

type: string

pattern: ^[\d +]+$

Taxon ids to include along with all their parents from the preclustering, if multiple given make sure to enclose with ’”’ and separate with a space.

type: string

pattern: ^[\d +]+$

Cluster algorithm used for contigs

type: string

(only with mash) Algorithm to partition the network.

type: string

Skip creation of the hybrid consensus, instead keep the scaffold with ambiguous bases if the depth of scaffolds is not high enough.

type: boolean

Identity threshold value used in clustering algorithms

type: number

default: 0.6

Minimum allowed contig size

type: integer

default: 500

Maximum allowed contig size

type: integer

default: 10000000

Define the maximum percentage of ambiguous bases in a contig

type: integer

default: 50

Skip the filtering of contigs that did not cluster together with other contigs

type: boolean

Define parameters for iterations to update denovo consensus using reference based improvements

Don’t realign reads to consensus sequences and redefine the consensus through (multiple) iterations

type: boolean

number of iterations

type: integer

default: 2

mapping tool used during iterations

type: string

variant caller used during iterations

type: string

call variants during the iterations

type: boolean

consensus tool used for calling new consensus during iterations

type: string

default: bcftools

calculate summary statistics during iterations

type: boolean

default: true

Parameters relating to the analysis of variants associated to contigs and scaffolds

Skip the analysis of variants for the external reference or contigs

type: boolean

Define which mapping tool needs to be used when mapping reads to reference

type: string

Sequence to use as a mapping reference instead of the de novo contigs or scaffolds

type: string

deduplicate the reads

type: boolean

default: true

Define the variant caller to use: ‘ivar’ or ‘bcftools’

type: string

consensus tool used for calling new consensus in final iteration

type: string

UMI seperator in fastq header.

type: string

default: :

Specify the sketch size, the number of (non-redundant) min-hashes that are kept.

type: integer

default: 4000

Specify the kmer size for mash to create their hashes

type: integer

default: 15

Define the minimum number of mapped reads in order to continue the variant and consensus calling

type: integer

default: 200

calculate summary statistics in final iteration

type: boolean

default: true

Directory containing the mutliqc headers for multiple tables like ‘clusters_summary_mqc.txt’, ‘blast_mqc.txt’, …

hidden

type: string

default: ${projectDir}/assets/mqc_comment

hidden

type: string

Apply different quality control techniques on the generated consensus genomes

Skip the quality measurements on consensus genomes

type: boolean

Skip the use of checkv for quality check

type: boolean

Reference database used by checkv for consensus quality control

type: string

Skip the annotation of the consensus constructs

type: boolean

Database used for annotation of the cosensus constructs

type: string

default: https://viralzone.expasy.org/resources/Virosaurus/2020%5F4/virosaurus90%5Fvertebrate-20200330.fas.gz

Skip the use of QUAST for quality check

type: boolean

Skip the blast search of contigs to the provided reference DB

type: boolean

Skip creating an alignment of each the collapsed clusters and each iterative step

type: boolean

Specify the search algorithm to use for mmseqs. 0: auto 1: amino acid, 2: translated, 3: nucleotide, 4: translated nucleotide alignment

type: integer

default: 4

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+\.?\s*(s|m|h|d|day)\s*)+$

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Delete the output directory if the pipeline fails

hidden

type: boolean

Custom yaml file contian g the table column names selection and new names.

hidden

type: string

default: ${projectDir}/assets/custom_table_headers.yml

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Show all params when using --help

hidden

type: boolean

Validation of parameters fails when an unrecognised parameter is found.

hidden

type: boolean

type: string

default: global_prefix

Validation of parameters in lenient more.

hidden

type: boolean

Prefix of all output files followed by [date][pipelineversion][runName]

type: string

Global prefix set if you don’t want metadata embedded in the prefix

type: string

nf-core/viralmetagenome