fastq_shortreads_preprocess_qc
Quality check and preprocessing subworkflow of Illumina short reads that can do: quality check of input reads and generate statistics, preprocess and validate reads, barcode removal, remove adapters and merge reads, filter by sequence complexity, deduplicate reads, remove host contamination, concatenate reads and generate statistics for post-processing reads. WARNING: requires at least the process configurations from the nextflow.config to be added to the modules.config in the pipeline in order to work as intended.
Description
Quality check and preprocessing subworkflow of Illumina short reads that can do: quality check of input reads and generate statistics, preprocess and validate reads, barcode removal, remove adapters and merge reads, filter by sequence complexity, deduplicate reads, remove host contamination, concatenate reads and generate statistics for post-processing reads. WARNING: requires at least the process configurations from the nextflow.config to be added to the modules.config in the pipeline in order to work as intended.
Input
List of FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
Structure: [ val(meta), [ path(reads) ] ]
*.fastq.gz Discard R1 or R2 after UMI extraction (0 = keep both, 1 = discard R1, 2 = discard R2)
Optional reference files, containing adapter and/or contaminant sequences for removal.
In fasta format for bbmap/bbduk and fastp, or in text format for AdapterRemoval (one adapter per line).
Used only by fastp.
Specify true to not write any reads that pass trimming thresholds from the fastp process.
This can be used to use fastp for the output report only.
Used only by fastp.
Specify true to save files that failed to pass fastp trimming thresholds
Complexity filtering tool to use.
Must be one of: ‘prinseqplusplus’, ‘bbduk’, or ‘fastp’.
Reference genome FASTA file for decontamination (optional)
Structure: [ val(meta), [ path(fasta) ] ]
*.{fasta,fa,fna} Pre-built reference index directory for decontamination (optional)
Structure: [ val(reference_name), path(reference_dir) ]
Output
FastQC HTML reports for pre-processing reads
Structure: [ val(meta), path(html) ]
*.html FastQC ZIP archives for pre-processing reads
Structure: [ val(meta), path(zip) ]
*.zip SeqFu check results for pre-processing reads
Structure: [ val(meta), path(check) ]
SeqFu statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]
SeqFu MultiQC-compatible stats for pre-processing reads
Structure: [ val(meta), path(multiqc) ]
SeqKit statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]
SeqTk composition statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]
FastQC HTML reports for post-processing reads
Structure: [ val(meta), path(html) ]
*.html FastQC ZIP archives for post-processing reads
Structure: [ val(meta), path(zip) ]
*.zip SeqFu check results for post-processing reads
Structure: [ val(meta), path(check) ]
SeqFu statistics for post-processing reads
Structure: [ val(meta), path(stats) ]
SeqFu MultiQC-compatible stats for post-processing reads
Structure: [ val(meta), path(multiqc) ]
SeqKit statistics for post-processing reads
Structure: [ val(meta), path(stats) ]
SeqTk composition statistics for post-processing reads
Structure: [ val(meta), path(stats) ]
Reads discarded during adapter removal or merging
Structure: [ val(meta), path(fastq) ]
*.fastq.gz Adapter removal execution log file
(trimmomatic {log}, trimgalore {txt}, fastp {log})
Structure: [ val(meta), path({log,txt}) ]
Adapter removal report
(trimmomatic {summary}, trimgalore {html,zip}, fastp {html})
Structure: [ val(meta), path({summary,html,zip}) ]
Report generated by complexity filtering
HTML report generated by fastp. Empty for other tools.
Structure: [ val(meta), path(html) ]
Hostile reference files used for decontamination
Structure: [ val(reference_name), path(reference_dir) ]