fastq_shortreads_preprocess_qc

Quality check and preprocessing subworkflow of Illumina short reads that can do: quality check of input reads and generate statistics, preprocess and validate reads, barcode removal, remove adapters and merge reads, filter by sequence complexity, deduplicate reads, remove host contamination, concatenate reads and generate statistics for post-processing reads. WARNING: requires at least the process configurations from the nextflow.config to be added to the modules.config in the pipeline in order to work as intended.

fastqilluminashortreadsqcstatspreprocessingbarcodingadaptersmergecomplexitydeduplicationhostdecontamination

https://github.com/nf-core/modules/[...]/subworkflows/nf-core/fastq_shortreads_preprocess_qc

Description

Input

name

description

pattern

`ch_reads`

List of FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
Structure: [ val(meta), [ path(reads) ] ]

*.fastq.gz

`skip_fastqc`

Skip FastQC quality control step

`skip_seqfu_check`

Skip SeqFu check step

`skip_seqfu_stats`

Skip SeqFu statistics step

`skip_seqkit_stats`

Skip SeqKit statistics step

`skip_seqtk_comp`

Skip SeqTk composition analysis step

`skip_seqkit_sana_pair`

Skip SeqKit sanitize and pair step

`skip_seqkit_seq`

Skip SeqKit sequence processing step

`skip_seqkit_replace`

Skip SeqKit replace step

`skip_seqkit_rmdup`

Skip SeqKit remove duplicates step

`skip_umitools_extract`

Skip UMI-tools extract barcoding step

`val_umi_discard_read`

Discard R1 or R2 after UMI extraction (0 = keep both, 1 = discard R1, 2 = discard R2)

`skip_adapterremoval`

Skip the adapter removal and merge subworkflow completely

`val_adapter_tool`

Choose one of the available adapter removal and/or merging tools

`ch_custom_adapters_file`

Optional reference files, containing adapter and/or contaminant sequences for removal.
In fasta format for bbmap/bbduk and fastp, or in text format for AdapterRemoval (one adapter per line).

`val_save_merged`

Specify true to output merged reads instead
Used by fastp and adapterremoval

`val_fastp_discard_trimmed_pass`

Used only by fastp.
Specify true to not write any reads that pass trimming thresholds from the fastp process.
This can be used to use fastp for the output report only.

`val_fastp_save_trimmed_fail`

Used only by fastp.
Specify true to save files that failed to pass fastp trimming thresholds

`skip_complexity_filtering`

Skip PRINSEQ++ complexity filtering step

`val_complexity_filter_tool`

Complexity filtering tool to use.
Must be one of: ‘prinseqplusplus’, ‘bbduk’, or ‘fastp’.

`skip_deduplication`

Skip BBMap Clumpify deduplication step

`skip_decontamination`

Skip host decontamination step

`ch_decontamination_fasta`

Reference genome FASTA file for decontamination (optional)
Structure: [ val(meta), [ path(fasta) ] ]

*.{fasta,fa,fna}

`ch_decontamination_reference`

Pre-built reference index directory for decontamination (optional)
Structure: [ val(reference_name), path(reference_dir) ]

`val_decontamination_index_name`

Name for the decontamination index (optional)

`val_decontamination_tool`

Decontamination tool to use (‘hostile’ or ‘deacon’)

`skip_final_concatenation`

Skip final FASTQ concatenation step

Output

name

description

pattern

`reads`

Channel containing processed short reads
Structure: [ val(meta), path(reads) ]

*.fastq.gz

`pre_stats_fastqc_html`

FastQC HTML reports for pre-processing reads
Structure: [ val(meta), path(html) ]

*.html

`pre_stats_fastqc_zip`

FastQC ZIP archives for pre-processing reads
Structure: [ val(meta), path(zip) ]

*.zip

`pre_stats_seqfu_check`

SeqFu check results for pre-processing reads
Structure: [ val(meta), path(check) ]

`pre_stats_seqfu_stats`

SeqFu statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]

`pre_stats_seqfu_multiqc`

SeqFu MultiQC-compatible stats for pre-processing reads
Structure: [ val(meta), path(multiqc) ]

`pre_stats_seqkit_stats`

SeqKit statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]

`pre_stats_seqtk_stats`

SeqTk composition statistics for pre-processing reads
Structure: [ val(meta), path(stats) ]

`post_stats_fastqc_html`

FastQC HTML reports for post-processing reads
Structure: [ val(meta), path(html) ]

*.html

`post_stats_fastqc_zip`

FastQC ZIP archives for post-processing reads
Structure: [ val(meta), path(zip) ]

*.zip

`post_stats_seqfu_check`

SeqFu check results for post-processing reads
Structure: [ val(meta), path(check) ]

`post_stats_seqfu_stats`

SeqFu statistics for post-processing reads
Structure: [ val(meta), path(stats) ]

`post_stats_seqfu_multiqc`

SeqFu MultiQC-compatible stats for post-processing reads
Structure: [ val(meta), path(multiqc) ]

`post_stats_seqkit_stats`

SeqKit statistics for post-processing reads
Structure: [ val(meta), path(stats) ]

`post_stats_seqtk_stats`

SeqTk composition statistics for post-processing reads
Structure: [ val(meta), path(stats) ]

`umi_log`

UMI-tools extract log file
Structure: [ val(meta), path(log) ]

`adapterremoval_discarded_reads`

Reads discarded during adapter removal or merging
Structure: [ val(meta), path(fastq) ]

*.fastq.gz

`adapterremoval_logfile`

Adapter removal execution log file
(trimmomatic {log}, trimgalore {txt}, fastp {log})
Structure: [ val(meta), path({log,txt}) ]

`adapterremoval_report`

Adapter removal report
(trimmomatic {summary}, trimgalore {html,zip}, fastp {html})
Structure: [ val(meta), path({summary,html,zip}) ]

`complexity_filter_log`

Log file from complexity filtering
Structure: [ val(meta), path(log) ]

`complexity_filter_report`

Report generated by complexity filtering
HTML report generated by fastp. Empty for other tools.
Structure: [ val(meta), path(html) ]

`clumpify_log`

BBMap Clumpify log file
Structure: [ val(meta), path(log) ]

`hostile_reference`

Hostile reference files used for decontamination
Structure: [ val(reference_name), path(reference_dir) ]

`hostile_json`

Hostile JSON report
Structure: [ val(meta), path(json) ]

`deacon_index`

Deacon index directory
Structure: [ val(meta), path(index) ]

`deacon_summary`

Deacon decontamination summary file
Structure: [ val(meta), path(log) ]

`multiqc_files`

MultiQC compatible files for aggregated reporting
Structure: [ path(files) ]

`versions`

File containing software versions
Structure: [ path(versions.yml) ]

versions.yml

fastq_shortreads_preprocess_qc

Description

Input

ch_reads

skip_fastqc

skip_seqfu_check

skip_seqfu_stats

skip_seqkit_stats

skip_seqtk_comp

skip_seqkit_sana_pair

skip_seqkit_seq

skip_seqkit_replace

skip_seqkit_rmdup

skip_umitools_extract

val_umi_discard_read

skip_adapterremoval

val_adapter_tool

ch_custom_adapters_file

val_save_merged

val_fastp_discard_trimmed_pass

val_fastp_save_trimmed_fail

skip_complexity_filtering

val_complexity_filter_tool

skip_deduplication

skip_decontamination

ch_decontamination_fasta

ch_decontamination_reference

val_decontamination_index_name

val_decontamination_tool

skip_final_concatenation

Output

reads

pre_stats_fastqc_html

pre_stats_fastqc_zip

pre_stats_seqfu_check

pre_stats_seqfu_stats

pre_stats_seqfu_multiqc

pre_stats_seqkit_stats

pre_stats_seqtk_stats

post_stats_fastqc_html

post_stats_fastqc_zip

post_stats_seqfu_check

post_stats_seqfu_stats

post_stats_seqfu_multiqc

post_stats_seqkit_stats

post_stats_seqtk_stats

umi_log

adapterremoval_discarded_reads

adapterremoval_logfile

adapterremoval_report

complexity_filter_log

complexity_filter_report

clumpify_log

hostile_reference

hostile_json

deacon_index

deacon_summary

multiqc_files

versions

included modules and subworkflows

maintainer

get in touch

`ch_reads`

`skip_fastqc`

`skip_seqfu_check`

`skip_seqfu_stats`

`skip_seqkit_stats`

`skip_seqtk_comp`

`skip_seqkit_sana_pair`

`skip_seqkit_seq`

`skip_seqkit_replace`

`skip_seqkit_rmdup`

`skip_umitools_extract`

`val_umi_discard_read`

`skip_adapterremoval`

`val_adapter_tool`

`ch_custom_adapters_file`

`val_save_merged`

`val_fastp_discard_trimmed_pass`

`val_fastp_save_trimmed_fail`

`skip_complexity_filtering`

`val_complexity_filter_tool`

`skip_deduplication`

`skip_decontamination`

`ch_decontamination_fasta`

`ch_decontamination_reference`

`val_decontamination_index_name`

`val_decontamination_tool`

`skip_final_concatenation`

`reads`

`pre_stats_fastqc_html`

`pre_stats_fastqc_zip`

`pre_stats_seqfu_check`

`pre_stats_seqfu_stats`

`pre_stats_seqfu_multiqc`

`pre_stats_seqkit_stats`

`pre_stats_seqtk_stats`

`post_stats_fastqc_html`

`post_stats_fastqc_zip`

`post_stats_seqfu_check`

`post_stats_seqfu_stats`

`post_stats_seqfu_multiqc`

`post_stats_seqkit_stats`

`post_stats_seqtk_stats`

`umi_log`

`adapterremoval_discarded_reads`

`adapterremoval_logfile`

`adapterremoval_report`

`complexity_filter_log`

`complexity_filter_report`

`clumpify_log`

`hostile_reference`

`hostile_json`

`deacon_index`

`deacon_summary`

`multiqc_files`

`versions`