nf-core/detaxizer

A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxon to identify (and remove) is Homo sapiens. Removal is optional.

de-identificationdecontaminationednafastqfilterlong-readsmetabarcodingmetagenomicsmicrobiomenanoporeshort-readsshotguntaxonomic-classificationtaxonomic-profiling

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/detaxizer

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Parameters which enable/disable certain steps used in the workflow.

If preprocessing with fastp should be turned on.

type: boolean

Signifies that bbduk is used in the classification process. Can be combined with the ‘classification_kraken2’ parameter to run both.

type: boolean

Signifies that kraken2 is used in the classification process. Can be combined with the ‘classification_bbduk’ parameter to run both. For kraken2 alone no parameter is needed.

type: boolean

If a validation of the classified reads via blastn should be carried out.

type: boolean

If the filtered reads should be classified with kraken2.

type: boolean

When a validation via blastn is wanted but the filtering should use the IDs from the classification process.

type: boolean

If the filtering step should be skipped.

type: boolean

Select the read-filtering tool: seqkit or bbmap. seqkit normalizes FASTQ headers by temporarily renaming them; bbmap uses filterbyname.sh for exact header matching – Note: BBTools I/O forces any base that is N to Q=0 (!).

type: string

If the removed reads should also be written to the output folder.

type: boolean

If the pre-processed reads should be used by the filter.

type: boolean

Save intermediates to the results folder.

type: boolean

Parameter to customize bbduk execution

Location of the fasta which contains the contaminant sequences.

type: string

Length of k-mers for classification carried out by bbduk

type: integer

default: 27

Parameters used by kraken2 to classify all reads provided. Fine-tuning of the isolation step can be done via the cutoff_* parameters.

The database which is used in the classification step. Please be aware that this default database will require ~60GB download and ~80GB RAM.

type: string

default: https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz

Save unclassified reads and classified reads (those assigned to any taxon, not specifically assessed or filtered) to separate files.

hidden

type: boolean

Save unclassified reads and classified reads (those assigned to any taxon, not specifically assessed or filtered) to separate files. For the filtered reads.

type: boolean

Save unclassified reads and classified reads (those assigned to any taxon, not specifically assessed or filtered) to separate files. For the removed reads.

type: boolean

Confidence in the classification of a read as a certain taxon.

type: number

Confidence in the classification of a read as a certain taxon. For the filtered reads.

type: number

Confidence in the classification of a read as a certain taxon. For the removed reads.

type: number

If a read has less k-mers assigned to the taxon/taxa to be assessed/to be filtered the read is ignored by the pipeline.

type: integer

Ratio per read of assigned to tax2filter k-mers to k-mers assigned to any other taxon (except unclassified).

type: number

Ratio per read of assigned to tax2filter k-mers to unclassified k-mers.

type: number

The taxon or taxonomic group to be assessed or filtered by the pipeline.

type: string

default: Homo sapiens

Parameters to fine-tune the output of blastn.

Location of the fasta from which the blastn database will be constructed.

type: string

Coverage is the percentage of the query sequence which can be found in the alignments of the sequence match. It can be used to fine-tune the validation step.

type: number

default: 40

The expected(e)-value contains information on how many hits of the same score can be found in a database of the size used in the query by chance. The parameter can be used to fine-tune the validation step.

type: number

default: 0.01

Identity is the percentage of the exact matches in the query and the sequence found in the database. The parameter can be used to fine-tune the validation step.

type: number

default: 40

Options to control the behavior of fastp

fastp option defining the minimum readlength of a read

type: integer

fastp option defining if the reads which failed to be trimmed should be saved

type: boolean

fastp option to define the threshold of quality of an individual base

type: integer

fastp option to define the mean quality for trimming

type: integer

default: 1

fastp option if duplicates should be filtered or not before classification

type: boolean

fastp option to define if the clipped reads should be saved

type: boolean

Reference genome related files and options required for the workflow.

Name of iGenomes reference.

type: string

default: GRCh38

Do not load the iGenomes reference config.

hidden

type: boolean

default: true

The base path to the igenomes reference files

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Options for generating input samplesheets for complementary downstream pipelines.

Turn on generation of samplesheets for downstream pipelines.

type: boolean

Specify a comma separated string in quotes to specify which pipeline to generate a samplesheet for.

type: string

default: taxprofiler,mag

pattern: ^(taxprofiler|mag)(?:,(taxprofiler|mag)){0,1}

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Display the help message.

type: boolean,string

Display the full detailed help message.

type: boolean

Display hidden parameters in the help message (only works when –help or –help_full are provided).

type: boolean

On this page