Folder containing paired-end demultiplexed FastQ files

type: string

Use this to specify the location of your input paired-end FastQ files.

For example:

--input 'path/to/data'  

Example for input data organization from one sequencing run with two samples:

data  
  |-sample1_1_L001_R1_001.fastq.gz  
  |-sample1_1_L001_R2_001.fastq.gz  
  |-sample2_1_L001_R1_001.fastq.gz  
  |-sample2_1_L001_R2_001.fastq.gz  

Please note the following requirements:

  1. The path must be enclosed in quotes
  2. The folder must contain gzip compressed paired-end demultiplexed fastq files. If the file names do not follow the default ("/*_R{1,2}_001.fastq.gz"), please check --extension.
  3. If your data is scattered, a directory with symlinks to your actual data might be a solution.
  4. All sequencing data should originate from one sequencing run, because processing relies on run-specific error models that are unreliable when data from several sequencing runs are mixed. Sequencing data originating from multiple sequencing runs requires additionally the parameter --multipleSequencingRuns and a specific folder structure.

Forward primer sequence

required
type: string

In amplicon sequencing methods, PCR with specific primers produces the amplicon of interest. These primer sequences need to be trimmed from the reads before further processing and are also required for producing an appropriate classifier.

For example:

--FW_primer GTGYCAGCMGCCGCGGTAA --RV_primer GGACTACNVGGGTWTCTAAT  

Reverse primer sequence

required
type: string

In amplicon sequencing methods, PCR with specific primers produces the amplicon of interest. These primer sequences need to be trimmed from the reads before further processing and are also required for producing an appropriate classifier.

For example:

--FW_primer GTGYCAGCMGCCGCGGTAA --RV_primer GGACTACNVGGGTWTCTAAT  

Path to metadata sheet, when missing most downstream analysis are skipped (barplots, PCoA plots, ...).

type: string

This is optional, but for performing downstream analysis such as barplots, diversity indices or differential abundance testing, a metadata file is essential.

For example:

--metadata "path/to/metadata.tsv"  

Please note the following requirements:

  1. The path must be enclosed in quotes
  2. The metadata file has to follow the QIIME2 specifications (https://docs.qiime2.org/2019.10/tutorials/metadata/)
  3. In case of multiple sequencing runs, specific naming of samples are required, see --multipleSequencingRuns

The first column in the metadata file is the identifier (ID) column and defines the sample or feature IDs associated with your study. Metadata files are not required to have additional metadata columns, so a file containing only an ID column is a valid QIIME 2 metadata file. Additional columns defining metadata associated with each sample or feature ID are optional.
NB: without additional columns there might be no groupings for the downstream analyses.

Identifiers should be 36 characters long or less, and also contain only ASCII alphanumeric characters (i.e. in the range of [a-z], [A-Z], or [0-9]), the period (.) character, or the dash (-) character. By default all numeric columns, blanks or NA are removed, and only columns with multiple different values but not all unique are selected.

The columns which are to be assessed can be specified by --metadata_category. If --metadata_category isn't specified than all columns that fit the specification are automatically chosen.

If samples were sequenced in multiple sequencing runs

type: boolean

If samples were sequenced in multiple sequencing runs. Expects one subfolder per sequencing run in the folder specified by --input containing sequencing data of the specific run. FastQC and MultiQC are skipped by default to prevent possible issues.

To prevent overlapping sample names from multiple sequencing runs, sample names obtained from the sequencing files will be renamed automatically by adding the folder name as prefix separated by a string specified by --split. Accordingly, the sample name column in the metadata file --metadata require values following subfolder-samplename.

Example for input data organization:

data  
  |-run1  
  |  |-sample1_1_L001_R{1,2}_001.fastq.gz  
  |  |-sample2_1_L001_R{1,2}_001.fastq.gz  
  |  
  |-run2  
     |-sample3_1_L001_R{1,2}_001.fastq.gz  
     |-sample4_1_L001_R{1,2}_001.fastq.gz  

In this example the first column in the metadata file requires the values run1-sample1 ... run2-sample4 (instead of sample1, ..., sample4).

Example command to analyze this data in one pipeline run:

nextflow run nf-core/ampliseq \  
    -profile singularity \  
    --input "data" \  
    --FW_primer GTGYCAGCMGCCGCGGTAA \  
    --RV_primer GGACTACNVGGGTWTCTAAT \  
    --metadata "data/Metadata.tsv" \  
    --multipleSequencingRuns  
Visually choosing sequencing read truncation cutoffs

While --untilQ2import with --multipleSequencingRuns is currently supported, --Q2imported is not. The pipeline can be first run with --untilQ2import, than --trunclenf and --trunclenr are visually chosen, and then the pipeline can be continued without --untilQ2import but with --trunlenf, --trunlenr, and -resume.

For example:

(1) To produce quality plots and choose truncation values:

nextflow run nf-core/ampliseq \  
    -profile singularity \  
    --input "data" \  
    --FW_primer GTGYCAGCMGCCGCGGTAA \  
    --RV_primer GGACTACNVGGGTWTCTAAT \  
    --metadata "data/Metadata.tsv" \  
    --multipleSequencingRuns \  
    --untilQ2import  

(2) To finish analysis:

nextflow run nf-core/ampliseq \  
    -profile singularity \  
    --input "data" \  
    --FW_primer GTGYCAGCMGCCGCGGTAA \  
    --RV_primer GGACTACNVGGGTWTCTAAT \  
    --metadata "data/Metadata.tsv" \  
    --multipleSequencingRuns \  
    --trunclenf 200 \  
    --trunclenr 180 \  
    -resume  

Path to ta- separated table with sample IDs, forward and reverse sequencing files

type: string

You can submit a manifest file as an alternative way to provide input reads. No submission of read files with --input is required this way.

A manifest must be a tab-separated file that must have the following labels in this exact order: sampleID, forwardReads, reverseReads. The sample identifiers must be listed under sampleID. Paths to forward and reverse reads must be reported under forwardReads and reverseReads, respectively.

Multiple sequencing runs not supported by --manifest at this stage.

Cutadapt will retain untrimmed reads, choose only if input reads are not expected to contain primer sequences.

type: boolean

When read sequences are trimmed, untrimmed read pairs are discarded routinely. Use this option to retain untrimmed read pairs. This is usually not recommended and is only of advantage for specific protocols that prevent sequencing PCR primers.

For example:

--retain_untrimmed  

DADA2 read truncation value for forward strand

type: integer

Read denoising by DADA2 creates an error profile specific to a sequencing run and uses this to correct sequencing errors. This method requires all reads to have the same length and as high quality as possible while maintaining at least 20 bp overlap for merging. One cutoff for the forward read --trunclenf and one for the reverse read --trunclenr truncate all longer reads at that position and drop all shorter reads.
These cutoffs are usually chosen visually using --untilQ2import, inspecting the quality plots in "results/demux", and resuming analysis with --Q2imported. If not set, these cutoffs will be determined automatically for the position before the mean quality score drops below --trunc_qmin.

For example:

--trunclenf 180 --trunclenr 120  

Please note:

  1. Overly aggressive truncation might lead to insufficient overlap for read merging
  2. Too little truncation might reduce denoised reads
  3. The code choosing these values automatically cannot take the points above into account, therefore setting --trunclenf and --trunclenr is recommended

DADA2 read truncation value for reverse strand

type: integer

Read denoising by DADA2 creates an error profile specific to a sequencing run and uses this to correct sequencing errors. This method requires all reads to have the same length and as high quality as possible while maintaining at least 20 bp overlap for merging. One cutoff for the forward read --trunclenf and one for the reverse read --trunclenr truncate all longer reads at that position and drop all shorter reads.
These cutoffs are usually chosen visually using --untilQ2import, inspecting the quality plots in "results/demux", and resuming analysis with --Q2imported. If not set, these cutoffs will be determined automatically for the position before the mean quality score drops below --trunc_qmin.

For example:

--trunclenf 180 --trunclenr 120  

Please note:

  1. Overly aggressive truncation might lead to insufficient overlap for read merging
  2. Too little truncation might reduce denoised reads
  3. The code choosing these values automatically cannot take the points above into account, therefore setting --trunclenf and --trunclenr is recommended

If --trunclenf and --trunclenr are not set, these values will be automatically determined using this median quality score

type: integer
default: 25

Automatically determine --trunclenf and --trunclenr before the median quality score drops below --trunc_qmin (default: 25). The fraction of reads retained is defined by --trunc_rmin, which might override the quality cutoff.

For example:

--trunc_qmin 35  

Please note:

  1. The code choosing --trunclenf and --trunclenr using --trunc_qmin automatically cannot take amplicon length or overlap requirements for merging into account, therefore use with caution.
  2. The default value of 25 is recommended. However, high quality data with a large paired sequence overlap might justify a higher value (e.g. 35). Also, very low quality data might require a lower value.
  3. If the quality cutoff is too low to include a certain fraction of reads that is specified by --trunc_rmin (default: 0.75, meaning at least 75% percent of reads are retained), a lower cutoff according to --trunc_rmin superseeds the quality cutoff.

Assures that values chosen with --trunc_qmin will retain a fraction of reads.

type: number
default: 0.75

Value can range from 0 to 1. 0 means no reads need to be retained and 1 means all reads need to be retained. The minimum lengths of --trunc_qmin and --trunc_rmin are chosen as DADA2 cutoffs.

Path to the qiime compatible file Silva_132_release.zip

type: string
default: https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip

By default, the workflow downloads SILVA (https://www.arb-silva.de/) v132 (https://www.arb-silva.de/documentation/release-132/) and extracts reference sequences and taxonomy clustered at 99% similarity and trains a Naive Bayes classifier to assign taxonomy to features.

Path to QIIME2 trained classifier file (typically *-classifier.qza)

type: string

If you have trained a compatible classifier before, from sources such as SILVA (https://www.arb-silva.de/), Greengenes (http://greengenes.secondgenome.com/downloads) or RDP (https://rdp.cme.msu.edu/).

For example:

--classifier "FW_primer-RV_primer-classifier.qza"  

Please note the following requirements:

  1. The path must be enclosed in quotes
  2. The classifier is a Naive Bayes classifier produced by "qiime feature-classifier fit-classifier-naive-bayes" (e.g. by this pipeline or from (https://docs.qiime2.org/2019.10/data-resources/))
  3. The primer pair for the amplicon PCR and the computing of the classifier are exactly the same (or fulllength, potentially lower performance)
  4. The classifier has to be trained by the same version of scikit-learn as this version of the pipeline uses (0.21.2)

Remove all hash signs from taxonomy strings, resolves a rare ValueError during classification (process classifier)

type: boolean

Dereplication of the database. Must bematching SILVA v132 and its subfolders. Database size is descreasing, but taxonomical assignments as well.

hidden
type: integer
default: 99

Comma separated list of unwanted taxa, to skip taxa filtering use "none"

type: string
default: mitochondria,chloroplast

Depending on the primers used, PCR might amplify unwanted or off-target DNA. By default sequences originating from mitochondria or chloroplasts are removed. The taxa specified are excluded from further analysis.
For example to exclude any taxa that contain mitochondria, chloroplast, or archaea:

--exclude_taxa "mitochondria,chloroplast,archaea"  

If you prefer not filtering the data, specify:

--exclude_taxa "none"  

Please note the following requirements:

  1. Comma separated list enclosed in quotes
  2. May not contain whitespace characters
  3. Features that contain one or several of these terms in their taxonomical classification are excluded from further analysis
  4. The taxonomy level is not taken into consideration

Abundance filtering

type: integer
default: 1

Remove entries from the feature table below an absolute abundance threshold (default: 1, meaning filter is disabled). Singletons are often regarded as artifacts, choosing a value of 2 removes sequences with less than 2 total counts from the feature table.

For example to remove singletons choose:

--min_frequency 2  

Prevalence filtering

type: integer
default: 1

Filtering low prevalent features from the feature table, e.g. keeping only features that are present in at least two samples can be achived by choosing a value of 2 (default: 1, meaning filter is disabled). Typically only used when having replicates for all samples.

For example to retain features that are present in at least two sample:

--min_samples 2  

Please note this is independent of abundance.

Define where the pipeline should find input data and save output data.

Path to test sequencing read files

hidden
type: string

Comma separated list of metadata column headers for statistics.

type: string

Here columns in the metadata sheet can be chosen with groupings that are used for diversity indices and differential abundance analysis. By default, all suitable columns in the metadata sheet will be used if this option is not specified. Suitable are columns which are categorical (not numerical) and have multiple different values which are not all unique. For example:

--metadata_category "treatment1,treatment2"  

Please note the following requirements:

  1. Comma separated list enclosed in quotes
  2. May not contain whitespace characters
  3. Each comma separated term has to match exactly one column name in the metadata sheet

If the sequencing data has PHRED 64 encoded quality scores, otherwise PHRED 33 is assumed

type: boolean

A string that will be used between the prepended run/folder name and the sample name. Only used with "--multipleSequencingRuns".

type: string
default: -

A string that will be used between the prepended run/folder name and the sample name. Only used with --multipleSequencingRuns (default: "-").

For example using the string link:

--split "link"  

Please note:

  1. Run/folder names may not contain the string specified by --split
  2. No underscore(s) allowed
  3. Must be enclosed in quotes
  4. The metadata sheet has to be adjusted, instead of using run-sample in the first column, in this example runlinksample is required

Naming of sequencing files

type: string
default: /*_R{1,2}_001.fastq.gz

Indicates the naming of sequencing files (default: "/*_R{1,2}_001.fastq.gz").

Please note:

  1. The prepended slash (/) is required
  2. The star (*) is the required wildcard for sample names
  3. The curly brackets ({}) enclose the orientation for paired end reads, seperated by a comma (,).
  4. The pattern must be enclosed in quotes

For example for one sample (name: 1) with forward (file: 1_a.fastq.gz) and reverse (file: 1_b.fastq.gz) reads in folder data:

--input "data" --extension "/*_{a,b}.fastq.gz"  

The output directory where the results will be saved.

type: string
default: ./results

Email address for completion summary.

type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

Needs to be specified to resolve a timezone error

type: string
default: Europe/Berlin

If a timezone error occurs, this parameter needs to be specified (default: 'Europe/Berlin'). Find your appropriate timezone with e.g. tzselect.
Note, this affects the timezone of the entire software environment.

Keep additional intermediate files, such as trimmed reads or various QIIME2 archives

type: boolean

Skip all steps after importing into QIIME2, used for visually choosing DADA2 parameter --trunclenf and --trunclenr

type: boolean

Path to imported reads (e.g. "demux.qza")

type: string

Analysis starting with a QIIME2 artefact with trimmed reads, typically produced before with --untilQ2import. This is only supported for data from a single sequencing run.

For data from multiple sequencing runs with --multipleSequencingRuns the pipeline can be first run with --untilQ2import and next run without --untilQ2import but with -resume.

Skip all steps after denoising, produce only sequences and abundance tables on ASV level

type: boolean

Skip FastQC

type: boolean

Skip alpha rarefaction

type: boolean

Skip producing barplot

type: boolean

Skip taxonomic classification

type: boolean

Skip producing any relative abundance tables

type: boolean

Skip alpha and beta diversity analysis

type: boolean

Skip differential abundance testing

type: boolean

Skip MultiQC reporting

type: boolean

Less common options for the pipeline, typically set in a config file.

Display help text.

type: boolean

Method used to save pipeline results to output directory.

type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Workflow name.

type: string

A custom name for the pipeline run. Unlike the core nextflow -name option with one hyphen this parameter can be reused multiple times, for example if using -resume. Passed through to steps such as MultiQC and used for things like report filenames and titles.

Email address for completion summary, only when pipeline fails.

type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

This works exactly as with --email, except emails are only sent if the workflow is not successful.

Send plain-text email instead of HTML.

type: boolean

Set to receive plain-text e-mails instead of HTML formatted.

File size limit when attaching MultiQC reports to summary emails.

type: string
default: 25.MB

If file generated by pipeline exceeds the threshold, it will not be attached.

Do not use coloured log outputs.

type: boolean

Set to disable colourful command line output and live life in monochrome.

Custom config file to supply to MultiQC.

type: string

Directory to keep pipeline Nextflow logs and reports.

type: string
default: ${params.outdir}/pipeline_info

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

type: string
default: master

Provide git commit id for custom Institutional configs hosted at nf-core/configs. This was implemented for reproducibility purposes. Default: master.

## Download and use config file with following git commit id  
--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96  

Base directory for Institutional configs.

type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the custom_config_base option. For example:

## Download and unzip the config files  
cd /path/to/my/configs  
wget https://github.com/nf-core/configs/archive/master.zip  
unzip master.zip  
  
## Run the pipeline  
cd /path/to/my/data  
nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/  

Note that the nf-core/tools helper package has a download command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.

Institutional configs hostname.

type: string

Institutional config description.

type: string

Institutional config contact information.

type: string

Institutional config URL link.

type: string
hidden
type: string

type: string
default: eu-west-1
type: string