Description

Quality check and preprocessing subworkflow of Illumina short reads that can do: quality check of input reads and generate statistics, preprocess and validate reads, barcode removal, remove adapters and merge reads, filter by sequence complexity, deduplicate reads, remove host contamination, concatenate reads and generate statistics for post-processing reads. WARNING: requires at least the process configurations from the nextflow.config to be added to the modules.config in the pipeline in order to work as intended.

Input

name
description
pattern

ch_reads

List of FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
Structure: [ val(meta), [ path(reads) ] ]

*.fastq.gz

skip_fastqc

Skip FastQC quality control step

skip_seqfu_check

Skip SeqFu check step

skip_seqfu_stats

Skip SeqFu statistics step

skip_seqkit_stats

Skip SeqKit statistics step

skip_seqtk_comp

Skip SeqTk composition analysis step

skip_seqkit_sana_pair

Skip SeqKit sanitize and pair step

skip_seqkit_seq

Skip SeqKit sequence processing step

skip_seqkit_replace

Skip SeqKit replace step

skip_seqkit_rmdup

Skip SeqKit remove duplicates step

skip_umitools_extract

Skip UMI-tools extract barcoding step

val_umi_discard_read

Discard R1 or R2 after UMI extraction (0 = keep both, 1 = discard R1, 2 = discard R2)

skip_adapterremoval

Skip the adapter removal and merge subworkflow completely

val_adapter_tool

Choose one of the available adapter removal and/or merging tools

ch_custom_adapters_file

Optional reference files, containing adapter and/or contaminant sequences for removal.
In fasta format for bbmap/bbduk and fastp, or in text format for AdapterRemoval (one adapter per line).

val_save_merged

Specify true to output merged reads instead
Used by fastp and adapterremoval

val_fastp_discard_trimmed_pass

Used only by fastp.
Specify true to not write any reads that pass trimming thresholds from the fastp process.
This can be used to use fastp for the output report only.

val_fastp_save_trimmed_fail

Used only by fastp.
Specify true to save files that failed to pass fastp trimming thresholds

skip_complexity_filtering

Skip PRINSEQ++ complexity filtering step

val_complexity_filter_tool

Complexity filtering tool to use.
Must be one of: ‘prinseqplusplus’, ‘bbduk’, or ‘fastp’.

skip_deduplication

Skip BBMap Clumpify deduplication step

skip_decontamination

Skip host decontamination step

ch_decontamination_fasta

Reference genome FASTA file for decontamination (optional)
Structure: [ val(meta), [ path(fasta) ] ]

*.{fasta,fa,fna}

ch_decontamination_reference

Pre-built reference index directory for decontamination (optional)
Structure: [ val(reference_name), path(reference_dir) ]

val_decontamination_index_name

Name for the decontamination index (optional)

val_decontamination_tool

Decontamination tool to use (‘hostile’ or ‘deacon’)

skip_final_concatenation

Skip final FASTQ concatenation step

Output

name
description
pattern

reads

Channel containing processed short reads
Structure: [ val(meta), path(reads) ]

*.fastq.gz

pre_stats_fastqc_html

FastQC HTML reports for pre-processing reads
Structure: [ val(meta), path(html) ]

*.html

pre_stats_fastqc_zip

FastQC ZIP archives for pre-processing reads
Structure: [ val(meta), path(zip) ]

*.zip

pre_stats_seqfu_check

SeqFu check results for pre-processing reads
Structure: [ val(meta), path(check) ]

pre_stats_seqfu_stats

SeqFu statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]

pre_stats_seqfu_multiqc

SeqFu MultiQC-compatible stats for pre-processing reads
Structure: [ val(meta), path(multiqc) ]

pre_stats_seqkit_stats

SeqKit statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]

pre_stats_seqtk_stats

SeqTk composition statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]

post_stats_fastqc_html

FastQC HTML reports for post-processing reads
Structure: [ val(meta), path(html) ]

*.html

post_stats_fastqc_zip

FastQC ZIP archives for post-processing reads
Structure: [ val(meta), path(zip) ]

*.zip

post_stats_seqfu_check

SeqFu check results for post-processing reads
Structure: [ val(meta), path(check) ]

post_stats_seqfu_stats

SeqFu statistics for post-processing reads
Structure: [ val(meta), path(stats) ]

post_stats_seqfu_multiqc

SeqFu MultiQC-compatible stats for post-processing reads
Structure: [ val(meta), path(multiqc) ]

post_stats_seqkit_stats

SeqKit statistics for post-processing reads
Structure: [ val(meta), path(stats) ]

post_stats_seqtk_stats

SeqTk composition statistics for post-processing reads
Structure: [ val(meta), path(stats) ]

umi_log

UMI-tools extract log file
Structure: [ val(meta), path(log) ]

adapterremoval_discarded_reads

Reads discarded during adapter removal or merging
Structure: [ val(meta), path(fastq) ]

*.fastq.gz

adapterremoval_logfile

Adapter removal execution log file
(trimmomatic {log}, trimgalore {txt}, fastp {log})
Structure: [ val(meta), path({log,txt}) ]

adapterremoval_report

Adapter removal report
(trimmomatic {summary}, trimgalore {html,zip}, fastp {html})
Structure: [ val(meta), path({summary,html,zip}) ]

complexity_filter_log

Log file from complexity filtering
Structure: [ val(meta), path(log) ]

complexity_filter_report

Report generated by complexity filtering
HTML report generated by fastp. Empty for other tools.
Structure: [ val(meta), path(html) ]

clumpify_log

BBMap Clumpify log file
Structure: [ val(meta), path(log) ]

hostile_reference

Hostile reference files used for decontamination
Structure: [ val(reference_name), path(reference_dir) ]

hostile_json

Hostile JSON report
Structure: [ val(meta), path(json) ]

deacon_index

Deacon index directory
Structure: [ val(meta), path(index) ]

deacon_summary

Deacon decontamination summary file
Structure: [ val(meta), path(log) ]

multiqc_files

MultiQC compatible files for aggregated reporting
Structure: [ path(files) ]

versions

File containing software versions
Structure: [ path(versions.yml) ]

versions.yml