Define where the pipeline should find input data and save output data.

Path to tab- or comma-separated file containing information about the samples in the experiment.

required
type: string
pattern: ^\S+\.(c|t)sv$

You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a tab- or comma-separated file with 11 columns, and a header row. See usage docs.

Specify to convert input BAM files back to FASTQ for remapping

type: boolean

This parameter tells the pipeline to convert the BAM files listed in the --input TSV or CSV sheet back to FASTQ format to allow re-preprocessing and mapping.

Can be useful when you want to ensure consistent mapping parameters across all libraries when incorporating public data, however be careful of biases that may come from re-processing again (the BAM files may already be clipped, or only mapped reads with different settings are included so you may not have all reads from the original publication).

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required
type: string

Email address for completion summary.

type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Reference genome related files and options required for the workflow.

Path to FASTA file of the reference genome.

type: string
pattern: ^\S+\.fn?a(sta)?(\.gz)?$

This parameter is mandatory if --genome or --fasta_sheet are not specified. If you don't supply a mapper index (e.g. for BWA), this will be generated for you automatically. Combine with --save_reference to save mapper index for future runs.

Specify path to samtools FASTA index.

type: string

If you want to use a pre-existing samtools faidx index, use this to specify the required FASTA index file for the selected reference genome. This should be generated by samtools faidx and has a file suffix of .fai.

Specify path to Picard sequence dictionary file.

type: string

If you want to use a pre-existing picard CreateSequenceDictionary dictionary file, use this to specify the required .dict file for the selected reference genome.

Specify path to directory containing index files of the FASTA for a given mapper.

type: string

For most people this will likely be the same directory that contains the file you provided to --fasta.

If you want to use pre-existing bwa index indices, the directory should contain files ending in '.amb' '.ann' '.bwt'. If you want to use pre-existing bowtie2 build indices, the directory should contain files ending in'.1.bt2', '.2.bt2', '.rev.1.bt2'.

In any case do not include the files themselves in the path. nf-core/eager will automagically detect the index files by searching for the FASTA filename with the corresponding bwa index/bowtie2 build file suffixes. If not supplied, the indices will be generated for you.

Specify to generate '.csi' BAM indices instead of '.bai' for larger reference genomes.

type: boolean

This parameter is required to be set for large reference genomes. If your reference genome is larger than 3.5GB, the samtools index calls in the pipeline need to generate .csi indices instead of .bai indices to compensate for the size of the reference genome (with samtools: -c). This parameter is not required for smaller references (including the human reference genomes hg19 or grch37/grch38).

Specify to save any pipeline-generated reference genome indices in the results directory.

type: boolean

Use this if you do not have pre-made reference FASTA indices for bwa, samtools and picard. If you turn this on, the indices nf-core/eager generates for you and will be saved in the <your_output_dir>/results/reference_genomes for you. If not supplied, nf-core/eager generated index references will be deleted.

Modifies SAMtools index command: -c

Path to a tab-/comma-separated file containing reference-specific files.

type: string
pattern: ^\S+\.(c|t)sv$

This parameter is mandatory if --genome or --fasta are not specified. If you don't supply a mapper index (e.g. for BWA), this will be generated for you automatically.

Name of iGenomes reference.

hidden
type: string

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Directory / URL base for iGenomes references.

hidden
type: string
default: s3://ngi-igenomes/igenomes/

Do not load the iGenomes reference config.

hidden
type: boolean

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

Specify the FASTA header of the extended chromosome when using circularmapper.

type: string

The entry (chromosome, contig, etc.) in your FASTA reference that you'd like to be treated as circular.

Applies only when providing a single FASTA file via --fasta (NOT multi-reference input - see reference TSV/CSV input).

Modifies tool parameter(s):

  • circulargenerator -s

Specify the number of bases to extend reference by (circularmapper only).

type: integer
default: 500

The number of bases to extend the beginning and end of each reference genome with.

Specify an elongated reference FASTA to be used for circularmapper.

type: string

Specify an already elongated FASTA file for circularmapper to avoid regeneration.

Specify a samtools index for the elongated FASTA file.

type: string

Specify the index for an already elongated FASTA file to avoid regeneration.

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden
type: string
default: master

Base directory for Institutional configs.

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional config name.

hidden
type: string

Institutional config description.

hidden
type: string

Institutional config contact information.

hidden
type: string

Institutional config URL link.

hidden
type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden
type: boolean

Method used to save pipeline results to output directory.

hidden
type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden
type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Send plain-text email instead of HTML.

hidden
type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden
type: string
default: 25.MB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden
type: boolean

Incoming hook URL for messaging service

hidden
type: string

Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

Custom config file to supply to MultiQC.

hidden
type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden
type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden
type: boolean
default: true

Base URL or local path to location of pipeline test dataset files

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden
type: string

Removal of adapters, paired-end merging, poly-G removal, etc.

Specify which tool to use for sequencing quality control.

type: string

Specify which tool to use for sequencing quality control.

Falco is designed as a drop-in replacement for FastQC but written in C++ for faster computation. We recommend using falco with very large datasets (due to reduced memory constraints).

Specify to skip all preprocessing steps (adapter removal, paired-end merging, poly-G trimming, etc).

type: boolean

Specify to skip all preprocessing steps (adapter removal, paired-end merging, poly-G trimming etc).

This will also mean you will only get one set of FastQC results (of the input reads).

Specify which preprocessing tool to use.

type: string

Specify which preprocessing tool to use.

AdapterRemoval is commonly used in palaeogenomics, however fastp has similar performance and has many additional functionality (including inbuilt complexity trimming) that can be often useful.

Specify to skip read-pair merging.

type: boolean

Turns off the paired-end read merging, and will result in paired-end mapping modes being used during reference of reads again alignment.

This can be useful in cases where you have long ancient DNA reads, modern DNA or when you want to utilise mate-pair 'spatial' information.

⚠️ If you run this with --preprocessing_minlength set to a value (as is by default!), you may end up removing single reads from either the pair1 or pair2 file. These reads will be NOT be mapped when aligning with either BWA or bowtie, as both can only accept one (forward) or two (forward and reverse) FASTQs as input in paired-end mode.

⚠️ If you run metagenomic screening as well as skipping merging, all reads will be screened as independent reads - not as pairs! - as all FASTQ files from BAM filtering are merged into one. This merged file is not saved in results directory.

Modifies AdapterRemoval parameter: --collapse
Modifies fastp parameter: --merge

Specify to exclude read-pairs that did not overlap sufficiently for merging (i.e., keep merged reads only).

type: boolean

Specify to exclude read-pairs that did not overlap sufficiently for merging (i.e., keep merged reads only). Singletons (i.e. reads missing a pair) or un-merged reads (where there wasn't sufficient overlap) are discarded.

Most ancient DNA molecules are very short, and the majority are expected to merge. Specifying this parameter can sometimes be useful when dealing with ultra-short aDNA reads to reduce the number of longer-reads you may have in your library that are derived from modern contamination. It can also speed up run time of mapping steps.

You may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality and/or are longer). It is highly recommended when using 'dedup' deduplication tool.

Specify to skip removal of adapters.

type: boolean

Specify to turn off trimming of adapters from reads.

You may wish to do this if you are using publicly available data, that should have all library artefacts from reads removed.

This will override any other adapter parameters provided (i.e, --preprocessing_adapterlist and --preprocessing_adapter{1,2} will be ignored)!

Modifies AdapterRemoval parameter: --adapter1 and --adapter2 (sets both to an empty string)
Applies fastp parameter: --disable_adapter_trimming

Specify the nucleotide sequence for the forward read/R1.

type: string

Specify a nucleotide sequence for the forward read/R1.

If not modified by the user, the default for the particular preprocessing tool will be used. Therefore, to turn off adapter trimming use --preprocessing_skipadaptertrim.

Modifies AdapterRemoval parameter: --adapter1
Modifies fastp parameter: --adapter_sequence

Specify the nucleotide sequence for the reverse read/R2.

type: string

Specify a nucleotide sequence for the forward read/R2.

If not modified by the user, the default for the particular preprocessing tool will be used. To turn off adapter trimming use --preprocessing_skipadaptertrim.

Modifies AdapterRemoval parameter: --adapter2
Modifies fastp parameter: --adapter_sequence_r2

Specify a list of all possible adapters to trim.

type: string

Specify a file with a list of adapter (combinations) to remove from all files.

Overrides the --preprocessing_adapter1/--preprocessing_adapter2 parameters.

Note that the two tools have slightly different behaviours.

For AdapterRemoval this consists of a two column table with a .txt extension: first column represents forward strand, second column for reverse strand. You must supply all possible combinations, one per line, and this list is applied to all files. Only Adapters in this list will be screened for and removed. See AdapterRemoval documentation for more information.

For fastp this consists of a standard FASTA format with a .fasta/.fa/.fna/.fas extension. The adapter sequence in this file should be at least 6bp long, otherwise it will be skipped. fastp will first perform auto-detection and removal of adapters and then additionally remove adapters present in the FASTA file one by one will.

Modifies AdapterRemoval parameter: --adapter-list
Modifies fastp parameter: --adapter_fasta

Specify the minimum length reads must have to be retained.

type: integer
default: 25

Specify the minimum length reads must have to be retained.

Reads smaller than this length after trimming are discarded and not included in downstream analyses. Typically in ancient DNA, users will set this to 30 or for very old samples around 25 bp - reads any shorter that this often are not specific enough to provide useful information.

Modifies AdapterRemoval parameter: --minlength
Modifies fastp parameter: --length_required

Specify number of bases to hard-trim from 5 prime or front of reads.

type: integer

Specify number of bases to hard-trim from 5 prime or front of reads. Exact behaviour varies per tool, see documentation. By default set to 0 to not perform any hard trimming.

This parameter allows users to 'hard' remove a number of bases from the beginning or end of reads, regardless of quality.

⚠️ When this trimming occurs depends on the tool, i.e., the exact behaviour is not the same between AdapterRemoval and fastp.

For fastp: 5p/3p trimming occurs prior to any other trimming (quality, poly-G, adapter). Please see the fastp documentation for more information. If you wish to use this to remove damage prior to mapping (to allow more specific mapping), ensure you have manually removed adapters/quality trimmed prior to giving the reads to nf-core/eager. Alternatively, you can use Bowtie 2's inbuilt pre-mapping read-end trimming functionality. Note that nf-core/eager only allows this hard trimming equally for both forward and reverse reads (i.e., you cannot provide different values for the 5p end for R1 and R2).

For AdapterRemoval, this trimming happens after the removal of adapters, however prior to quality trimming. Therefore, this is more suitable for hard-removal of damage before mapping (however the Bowtie 2 system will be more reliable).

Modifies AdapterRemoval parameters: --trim5p
Modifies fastp parameters: --trim_front1 and/or --trim_front2

Specify number of bases to hard-trim from 3 prime or tail of reads.

type: integer

Specify number of bases to hard-trim from 3 prime or tail of reads. Exact behaviour varies per tool, see documentation. By default set to 0 to not perform any hard trimming.

This parameter allows users to 'hard' remove a number of bases from the beginning or end of reads, regardless of quality.

⚠️ When this trimming occurs depends on the tool, i.e., the exact behaviour is not the same between AdapterRemoval and fastp.

For fastp: 5p/3p trimming occurs prior to any other trimming (quality, poly-G, adapter). Please see the fastp documentation for more information. If you wish to use this to remove damage prior to mapping (to allow more specific mapping), ensure you have manually removed adapters/quality trimmed prior to giving the reads to nf-core/eager. Alternatively, you can use Bowtie 2's inbuilt pre-mapping read-end trimming functionality. Note that nf-core/eager only allows this hard trimming equally for both forward and reverse reads (i.e., you cannot provide different values for the 3p end for R1 and R2).

For AdapterRemoval, this trimming happens after the removal of adapters, however prior to quality trimming. Therefore this is more suitable for hard-removal of damage before mapping (however the Bowtie 2 system will be more reliable).

Modifies AdapterRemoval parameters: --trim3p
Modifies fastp parameters: --trim_tail1 and/or --trim_tail2

Specify to save the preprocessed reads in the results directory.

type: boolean

Specify to save the preprocessed reads in FASTQ format the results directory.

This can be useful for re-analysing FASTQ files manually, or uploading to public data repositories such as ENA/SRA (provided you don't filter by length or merge paired reads).

Specify to turn on sequence complexity filtering of reads.

type: boolean

Performs a poly-G tail removal step in the beginning of the pipeline using fastp.

This can be useful for trimming ploy-G tails from short-fragments sequenced on two-colour Illumina chemistry such as NextSeqs or NovaSeqs (where no-fluorescence is read as a G on two-colour chemistry), which can inflate reported GC content values.

Modifies fastp parameter: --trim_poly_g

Specify the complexity threshold that must be reached or exceeded to retain reads.

type: integer
default: 10

This option can be used to define the minimum length of a poly-G tail to begin low complexity trimming.

Modifies fastp parameter: --poly_g_min_len

Skip AdapterRemoval quality and N base trimming at 5 prime end.

type: boolean

Turns off quality based trimming at the 5p end of reads when any of the AdapterRemoval quality or N trimming options are used. Only 3p end of reads will be removed.

This also entirely disables quality based trimming of collapsed reads, since both ends of these are informative for PCR duplicate filtering. For more information see the AdapterRemoval documentation.

Modifies AdapterRemoval parameters: --preserve5p

Specify to skip AdapterRemoval quality and N trimming at the ends of reads.

type: boolean

Turns off AdapterRemoval quality trimming from ends of reads.

This can be useful to reduce runtime when running public data that has already been processed.

Modifies AdapterRemoval parameters: --trimqualities

Specify AdapterRemoval minimum base quality for trimming off bases.

type: integer
default: 20

Defines the minimum read quality per base that is required for a base to be kept by AdapterRemoval. Individual bases at the ends of reads falling below this threshold will be clipped off.

Modifies AdapterRemoval parameter: --minquality

Specify to skip AdapterRemoval N trimming (quality trimming only).

type: boolean

Turns off AdapterRemoval N trimming from ends of reads.

This can be useful to reduce runtime when running publicly available data that has already been processed.

Modifies AdapterRemoval parameters: --trimns

Specify the AdapterRemoval minimum adapter overlap required for trimming.

type: integer
default: 1

Specifies a minimum number of bases that overlap with the adapter sequence before AdapterRemoval trims adapters sequences from reads.

Modifies AdapterRemoval parameter: --minadapteroverlap

Specify the AdapterRemoval maximum Phred score used in input FASTQ files.

type: integer
default: 41

Specify maximum Phred score of the quality field of FASTQ files.

The quality-score range can vary depending on the machine and version (e.g. see diagram here, and this allows you to increase from the default AdapterRemoval value of 41.

Note that while this can theoretically provide you with more confident and precise base call information, many downstream tools only accept FASTQ files with Phred scores limited to a max of 41, and therefore increasing the default for this parameter may make the resulting preprocessed files incompatible with some downstream tools.

Modifies AdapterRemoval parameters: --qualitymax

Options for aligning reads against reference genome(s)

Specify to turn on FASTQ sharding.

type: boolean

Sharding will split the FASTQs into smaller chunks before mapping. These chunks are then mapped in parallel. This approach can speed up the mapping process for larger FASTQ files.

Specify the number of reads in each shard when splitting.

type: integer
default: 1000000

Make sure to choose a value that makes sense for your dataset. Small values can create many files, which can end up negatively affecting the overall speed of the mapping process.

Specify which mapper to use.

type: string

Specify which mapping tool to use. Options are BWA aln ('bwaaln'), BWA mem ('bwamem'), circularmapper ('circularmapper'), or Bowtie 2 ('bowtie2'). BWA aln is the default and highly suited for short-read ancient DNA. BWA mem can be quite useful for modern DNA, but is rarely used in projects for ancient DNA. CircularMapper enhances the mapping procedure to circular references, using the BWA algorithm but utilizing an extend-remap procedure (see Peltzer et al 2016 for details). Bowtie 2 is similar to BWA aln, and has recently been suggested to provide slightly better results under certain conditions (Poullet and Orlando 2020), as well as providing extra functionality (such as FASTQ trimming).

More documentation can be seen for each tool under:

Specify the amount of allowed mismatches in the alignment for mapping with BWA aln.

type: number
default: 0.01

Specify how many mismatches are allowed in a read during alignment with BWA aln. Default is set following recommendations from Oliva et al. 2021 who compared alignment to human reference genomes.

If you're uncertain what value to use, check out this Shiny App for more information.

Modifies BWA aln parameter: -n

Specify the maximum edit distance allowed in a seed for mapping with BWA aln.

type: integer
default: 2

Specify the maximum edit distance during the seeding phase of the BWA aln mapping algorithm.

Modifies BWA aln parameter: -k

Specify the length of seeds to be used for BWA aln.

type: integer
default: 1024

Specify the length of the seed used in BWA aln. Default is set to be 'turned off' at the recommendation of Oliva et al. 2021 who tested when aligning to human reference genomes. Seeding is 'turned off' by specifying an arbitrarily long number to force the entire read to act as the seed.

Note: Despite being recommended, turning off seeding can result in long runtimes!

Modifies BWA aln parameter: -l

Specify the number of gaps allowed for alignment with BWA aln.

type: integer
default: 2

Specify the number of gaps allowed for mapping with BWA aln. Default is set to BWA default.

Modifies BWA aln parameter: -o

Specify the minimum seed length for alignment with BWA mem.

type: integer
default: 19

Configures the minimum seed length used in BWA mem. Default is set to BWA default.

Modifies BWA mem parameter: -k

Specify the re-seeding threshold for alignment with BWA mem.

type: number
default: 1.5

Configures the re-seeding threshold used in BWA mem. Default is set to BWA default.

Modifies BWA mem parameter: -r

Specify the Bowtie 2 alignment mode.

type: string

Specify the type of read alignment to use with Bowtie 2. 'Local' allows only partial alignment of read with ends of reads possibly 'soft-clipped' (i.e. remain unaligned/ignored), if the soft-clipped alignment provides best alignment score. 'End-to-end' requires all nucleotides to be aligned.
Default is set following Cahill et al (2018) and Poullet and Orlando 2020

Modifies Bowtie 2 presets: --local, --end-to-end

Specify the level of sensitivity for the Bowtie 2 alignment mode.

type: string

Specify the Bowtie 2 'preset' to use. These strings apply to both --mapping_bowtie2_alignmode options. See the Bowtie 2 manual for actual settings.
Default is set following Poullet and Orlando (2020), when running damaged-data without UDG treatment.

Modifies the Bowtie 2 parameters: --fast, --very-fast, --sensitive, --very-sensitive, --fast-local, --very-fast-local, --sensitive-local, --very-sensitive-local

Specify the number of mismatches in seed for alignment with Bowtie 2.

type: integer

Specify the number of mismatches allowed in the seed during seed-and-extend procedure of Bowtie 2. This will override any values set with --mapping_bowtie2_sensitivity. Can either be 0 or 1.

Modifies Bowtie 2 parameter: -N

Specify the length of seed substrings for Bowtie 2.

type: integer
default: 20

Specify the length of the seed sub-string to use during seeding of Bowtie 2. This will override any values set with --mapping_bowtie2_sensitivity.

Modifies Bowtie 2 parameter: -L

Specify the number of bases to trim off from 5 prime end of read before alignment with Bowtie 2.

type: integer

Specify the number of bases to trim at the 5' (left) end of read before alignment with Bowtie 2. This may be useful when left-over sequencing artefacts of in-line barcodes are present.

Modifies Bowtie 2 parameter: --trim5

Specify the number of bases to trim off from 3 prime end of read before alignment with Bowtie 2.

type: integer

Specify the number of bases to trim at the 3' (right) end of read before alignment with Bowtie 2. This may be useful when left-over sequencing artefacts of in-line barcodes are present.

Modifies Bowtie 2 parameter: --trim3

Specify the maximum fragment length for Bowtie2 paired-end mapping mode only.

type: integer
default: 500

The maximum fragment for valid paired-end alignments. Only for paired-end mapping (i.e. unmerged), and therefore typically only useful for modern data.

Modifies Bowtie2 parameter: --maxins

Turn on to remove reads that did not map to the circularised genome.

type: boolean

If you want to filter out reads that don't map to elongated/circularised chromosome (and also non-circular chromosome headers) from the resulting BAM file, turn this on.

Modifies -f and -x parameters of CircularMapper's RealignSAMFile

Options related to length, quality, and map status filtering of reads.

Specify to turn on filtering of reads in BAM files after mapping. By default, only mapped reads retained.

type: boolean

Turns on the filtering subworkflow for mapped BAM files after the read alignment step. Filtering includes removal of unmapped reads, length filtering, and mapping quality filtering.

When turning on BAM filtering, by default only the mapped/unmapped filter is activated, thus only mapped reads are retained for downstream analyses. See --bamfiltering_retainunmappedgenomicbam to retain unmapped reads, if filtering only for length and/or quality is preferred.

Note this subworkflow can also be activated if --run_metagenomics is supplied.

Specify the minimum read length mapped reads should have for downstream genomic analysis.

type: integer

Specify to remove mapped reads that fall below a certain length threshold after mapping.

This can be useful to get more realistic 'endogenous DNA' or 'on target read' percentages.

If used instead of minimum length read filtering at AdapterRemoval, you can get more more realistic endogenous DNA estimates when most of your reads are very short (e.g. in single-stranded libraries or samples with highly degraded DNA). In these cases, the default minimum length filter at earlier adapter clipping/read merging will remove a very large amount of your reads in your library (including valid reads), thus making an artificially small denominator for a typical endogenous DNA calculation.

Therefore by retaining all of your reads until after mapping (i.e., turning off the adapter clipping/read merging filter), you can generate more 'real' endogenous DNA estimates immediately after mapping (with a better denominator). Then after estimating this, filter using this parameter to retain only 'useful' reads (i.e., those long enough to provide higher confidence of their mapped position) for downstream analyses.

By specifying 0, no length filtering is performed.

Note that by default the output BAM files of this step are not stored in the results directory (as it is assumed that deduplicated BAM files are preferred). See --bamfiltering_savefilteredbams if you wish to save these.

Modifies filter_bam_fragment_length.py parameter: -l

Specify the minimum mapping quality reads should have for downstream genomic analysis.

type: integer

Specify a mapping quality threshold for mapped reads to be kept for downstream analysis.

By default all reads are retained and this option is therefore set to 0 to ensure no quality filtering is performed.

Note that by default the output BAM files of this step are not stored in the results directory (as it is assumed that deduplicated BAM files are preferred). See --bamfiltering_savefilteredbams if you wish to save these.

Modifies samtools view parameter: -q

Specify the SAM format flag of reads to remove during BAM filtering for downstream genomic steps.

type: integer
default: 4

Specify to customise the exact SAM format flag of reads you wish to remove from your BAM file to for downstream genomic analyses.

You can explore more using a tool from the Broad Institute here

⚠️ Modify at your own risk, alternative flags are not necessarily supported in downstream steps!

Modifies samtools parameter: -F

Specify to retain unmapped reads in the BAM file used for downstream genomic analyses.

type: boolean

Specify to retain unmapped reads (optionally also length filtered) in the genomic BAM for downstream analysis. By default, the pipeline only keeps mapped reads for downstream analysis.

This is also turned on if --metagenomics_input is set to all.

⚠️ This will likely slow down run time of downstream pipeline steps!

Modifies tool parameter(s):

  • samtools view: -f 4 / -F 4

Specify to generate FASTQ files containing only unmapped reads from the aligner generated BAM files.

type: boolean

Specify to turn on the generation and saving of FASTQs of only the unmapped reads from the mapping step in the results directory.

This can be useful if you wish to do other analysis of the unmapped reads independently of the pipeline.

Note: the reads in these FASTQ files have not undergone length of quality filtering

Modifies samtools fastq parameter: -f 4

Specify to generate FASTQ files containing only mapped reads from the aligner generated BAM files.

type: boolean

Specify to turn on the generation and saving of FASTQs of only the mapped reads from the mapping step in the results directory.

This can be useful if you wish to do other analysis of the mapped reads independently of the pipeline, such as remapping with different parameters (whereby only including mapped reads will speed up computation time during the re-mapping due to reduced input data).

Note the reads in these FASTQ files have not undergone length of quality filtering

Modifies samtools fastq parameter: -F 4

Specify to save the intermediate filtered genomic BAM files in the results directory.

type: boolean

Specify to save intermediate length- and/or quality-filtered genomic BAM files in the results directory.

Options related to metagenomic screening.

Specify to turn on metagenomic screening of mapped, unmapped or all reads.

type: boolean

Specify to turn on the metagenomic screening subworkflow of the pipeline, where reads are screened against large databases. Typically used for pathogen screening or microbial community analysis.

If supplied, this will also turn on the BAM filtering subworkflow of the pipeline.

Requires subsequent specification of --metagenomics_profiling_tool and --metagenomics_profiling_database.

Specify which type of reads to use for metagenomic screening.

type: string

Specify to select which mapped reads will be sent for metagenomic analysis.

This influences which reads are sent to this step, whether you want unmapped reads (used in most cases, as 'host reads' can often be contaminants in microbial genomes), mapped reads (e.g, when doing competitive against a genomic reference of multiple genomes and which to apply LCA correction) or all reads.

⚠️ If you skip paired-end merging, all reads will be screened as independent reads - not as pairs! - as all FASTQ files from BAM filtering are merged into one. This merged file is not saved in results directory.

Modifies samtools fastq parameters: -f 4 / -F 4

Specify to run a complexity filter on the metagenomics input files before classification.

type: boolean

Specify to turn on a subworkflow of the pipeline that filters the FASTQ files for complexity before the metagenomics profiling.
Use the --metagenomics_complexity_tool parameter to select a method.

Specify to save FASTQ files containing the complexity-filtered reads before metagenomic classification.

type: boolean

Specify to save the complexity-filtered FASTQ files to the results directory.

Specify which tool to use for trimming, filtering or reformatting of FASTQ reads that go into metagenomics screening.

type: string

Specify to select which tool is used to generate a final set of reads for the metagenomic classifier after any necessary trimming, filtering or reformatting of the reads.

This intermediate file is not saved in the results directory unless marked with --metagenomics_complexity_savefastq.

Specify the entropy threshold under which a sequencing read will be complexity-filtered out.

type: number
default: 0.3

Specify the minimum 'entropy' value for complexity filtering for the BBDuk or PRINSEQ++ tools.

This value will only be used for PRINSEQ++ if --metagenomics_prinseq_mode is set to entropy.

Entropy here corresponds to the amount of sequence variation existing within the read. Higher values correspond to more variety and thus will likely result in more specific matching to a taxon's reference genome. The trade-off here is fewer reads (or abundance information) available for having a confident identification.

Modifies parameters:

  • BBDuk: entropy=
  • PRINSEQ++: -lc_entropy

Specify the complexity filter mode for PRINSEQ++.

type: string

Specify the complexity filter mode for PRINSEQ++.

Use the selected mode together with the correct flag:
'dust' requires the --metagenomics_prinseq_dustscore parameter set
'entropy' requires the --metagenomics_complexity_entropy parameter set

Modifies parameters:

  • PRINSEQ++: -lc_entropy
  • PRINSEQ++: -lc_dust

Specify the minimum dust score for PRINTSEQ++ complexity filtering

type: number
default: 0.5

Specify the minimum dust score below which low-complexity reads will be removed. A DUST score is based on how often different tri-nucleotides occur along a read.

Modifies tool parameter(s):

  • PRINSEQ++: --lc_dust

Specify which tool to use for metagenomic profiling and screening. Required if --run_metagenomics flagged.

type: string

Select which tool to run metagenomics profiling on designated metagenomics_input. These tools behave vastly differently due to performing read profiling using different methods and yield vastly different reuslts.

MALT and MetaPhlAn are alignment based, whereas Kraken2 and KrakenUniq are k-mer based.

MALT has addtional postprocessing available (via --run_metagenomics_postprocessing) which can help authenticate alignments to a provided list of taxonomic nodes using established ancientDNA characteristics.

MetaPhlAn performs profiling on the metagenomcis input data. This may be used to characterize the metagenomic community of a sample but care must be taken that you are not just looking at the modern metagenome of an ancient sample (for instance, soil microbes on a bone)

Kraken2 and KrakenUniq are metagenomics classifiers that rely on fast k-mer-matching rather than whole-read alignments and are very memory efficient.

Specify a databse directory or .tar.gz file of a database directory to run metagenomics profiling on. Required if --run_metagenomics flagged.

type: string

Specify a metagenomics profiling database to use with the designated metagenomics_profiling_tool on the selected metagenomics_input. Databases can be provided both as a directory, or a tar.gz of a directory. Metagenomic databases are NOT compatible across different tools (ie a MALT database is different from a kraken2 database).

All databases need to be pre-built/downloaded for use in nf-core/eager. Database construction is often a balancing act between breadth of sequence diversity and size.

Modifies tool parameter(s):

  • krakenuniq: --db
  • kraken2: --db
  • MetaPhlAn: --bowtie2db and --index
  • MALT: '-index'

Turn on saving reads assigned by KrakenUniq or Kraken2

type: boolean

Save reads that do and do not have a taxonomic classification in your output results directory in FASTQ format.

Modifies tool parameter(s):

  • krakenuniq: --classified-out and --unclassified-out

Turn on saving of KrakenUniq or Kraken2 per-read taxonomic assignment file

type: boolean

Save a text file that contains a list of each read that had a taxonomic assignment, with information on specific taxonomic taxonomic assignment that that read recieved.

Modifies tool parameter(s):

  • krakenuniq: --output

Specify how large to chunk database when loading into memory for KrakenUniq

type: string
default: 16G

nf-core/eager utilises a 'low memory' option for KrakenUniq that can reduce the amount of RAM the process requires using the --preloaded option.

A further extension to this option is that you can specify how large each chunk of the database should be that gets loaded into memory at any one time. You can specify the amount of RAM to chunk the database to with this parameter, and is particularly useful for people with limited computational resources.

More information about this parameter can be seen here.

Modifies KrakenUniq parameter: --preload-size

Turn on saving minimizer information in the kraken2 report thus increasing to an eight column layout.

type: boolean

Turn on saving minimizer information in the kraken2 report thus increasing to an eight column layout.

Modifies kraken2 parameter: --report-minimizer-data.

Specify which alignment mode to use for MALT.

type: string

Use this to run the program in 'BlastN', 'BlastP', 'BlastX' modes to align DNA
and DNA, protein and protein, or DNA reads against protein references respectively. Ensure your database matches the mode. Check the MALT manual for more details.

Only when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT: -m

Specify alignment method for MALT.

type: string

Specify what alignment algorithm to use. Options are 'Local' or 'SemiGlobal'. Local is a BLAST like alignment, but is much slower. Semi-global alignment aligns reads end-to-end. Default: 'SemiGlobal'

Only when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT: -at

Percent identity value threshold for MALT.

type: integer
default: 85

Specify the minimum percent identity (or similarity) a sequence must have to the reference for it to be retained.

Only used when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT:-id

Specify the percent for LCA algorithm for MALT (see MEGAN6 CE manual).

type: integer
default: 1

Specify the top percent value of the LCA algorithm. From the MALT manual: "For each
read, only those matches are used for taxonomic placement whose bit disjointScore is within
10% of the best disjointScore for that read.".

Only when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT: -top

Specify whether to use percent or raw number of reads for minimum support required for taxon to be retained for MALT.

type: string

Specify whether to use a percentage, or raw number of reads as the value used to decide the minimum support a taxon requires to be retained.

Only when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT: -sup and -supp

Specify the minimum percentage of reads a taxon of sample total is required to have to be retained for MALT.

type: number
default: 0.01

Specify the minimum number of reads (as a percentage of all assigned reads) a given taxon is required to have to be retained as a positive 'hit' in the RMA6 file. This only applies when --malt_min_support_mode is set to 'percent'.

Only when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT: -supp

Specify a minimum number of reads a taxon of sample total is required to have to be retained in malt or kraken. Not compatible with --malt_min_support_mode 'percent'.

type: integer
default: 1

For usage in malt: Specify the minimum number of reads a given taxon is required to have to be retained as a positive 'hit'.
For malt, this only applies when --malt_min_support_mode is set to 'reads'.

Modifies tool parameter(s):

  • MALT: -sup

Specify the maximum number of queries a read can have for MALT.

type: integer
default: 100

Specify the maximum number of alignments a read can have. All further alignments are discarded.

Only when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT: -mq

Specify the memory load method. Do not use 'map' with GPFS file systems for MALT as can be very slow.

type: string

How to load the database into memory. Options are 'load', 'page' or 'map'.
'load' directly loads the entire database into memory prior seed look up, this
is slow but compatible with all servers/file systems. 'page' and 'map'
perform a sort of 'chunked' database loading, allowing seed look up prior entire
database loading. Note that Page and Map modes do not work properly not with
many remote file-systems such as GPFS.

Only when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MALT: --memoryMode

Specify to also produce SAM alignment files. Note this includes both aligned and unaligned reads, and are gzipped. Note this will result in very large file sizes.

type: boolean

Specify to also produce gzipped SAM files of all alignments and un-aligned reads in addition to RMA6 files. These are not soft-clipped or in 'sparse' format. Can be useful for downstream analyses due to more common file format.

⚠️ can result in very large run output directories as this is essentially duplication of the RMA6 files.

Sets tool parameter(s):

  • MALT: --alignments

Define how many fastq files should be submitted in the same malt run. Default value of 0 runs all files at once.

type: integer

Very many (large) fastq files run through MALT at the same time can lead to excessively long runtimes. This parameter allows for parallelization of MALT runs. Please note, MALT is resource heavy and setting this value N above the default (0) will spawn multiple metagenomics_malt_group_size jobs where N is the number of samples per group. Please only use this if it is necessary to avoid runtime limits on your HPC cluster since the overhead of loading a database is high.

Activate post-processing of metagenomics profiling tool selected.

type: boolean

Activate the corresponding post-processing tool for your metagenomics profiling software.

malt --> maltextract
krakenuniq/kraken2/metaphlan --> taxpasta

Note: Postprocessing is automatically carried out when using kraken2 and krakenuniq

Path to a text file with taxa of interest (one taxon per row, NCBI taxonomy name format)

type: string

Path to a .txt file with taxa of interest you wish to assess for aDNA characteristics. In .txt file should be one taxon per row, and the taxon should be in a valid NCBI taxonomy name format corresponding to a taxonomic node in your MALT database. An example can be found on the HOPS github.\n\nNecessary when --metagenomics_profiling_tool malt specified and --metagenomics_run_postprocessing flagged.

Modifies tool parameter(s):

  • MaltExtract: -t

Path to directory containing containing NCBI resource files (ncbi.tre and ncbi.map; available: https://github.com/rhuebler/HOPS/)

type: string

Path to directory containing containing the NCBI resource tree and taxonomy table files (ncbi.tre and ncbi.map; available at the HOPS repository).\n\nNecessary when --metagenomics_profiling_tool malt and --metagenomics_run_postprocessing specified.

Modifies tool parameter(s):

  • MaltExtract: -r

Specify which MaltExtract filter to use.

type: string

Specify which MaltExtract filter to use. This is used to specify what types of characteristics to scan for. The default will output statistics on all alignments, and then a second set with just reads with one C to T mismatch in the first 5 bases. Further details on other parameters can be seen in the HOPS documentation.\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MaltExtract: -f

Specify percent of top alignments to use.

type: number
default: 0.01

Specify frequency of top alignments for each read to be considered for each node.\n Note, value should be given in the format of a proportion (where 1 would correspond to 100%, and 0.1 would correspond to 10%).\n\n> ⚠️ this parameter follows the same concept as --malt_top_percent but uses a different notation i.e. integer (MALT) versus float (MALTExtract)\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MaltExtract: -a

Turn off destacking.

type: boolean

Turn off destacking. If left on, a read that overlaps with another read will be\nremoved (leaving a depth coverage of 1).\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Sets tool parameter(s):

  • MaltExtract: --destackingOff

Turn off downsampling.

type: boolean

Turn off downsampling. By default, downsampling is on and will randomly select 10,000 reads if the number of reads on a node exceeds this number. This is to speed up processing, under the assumption at 10,000 reads the species is a 'true positive'.\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Sets tool parameter(s):

  • MaltExtract: --downSampOff

Turn off duplicate removal.

type: boolean

Turn off duplicate removal. By default, reads that are an exact copy (i.e. same start, stop coordinate and exact sequence match) will be removed as it is considered a PCR duplicate.\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Sets tool parameter(s):

  • MaltExtract: --dupRemOff

Turn on exporting alignments of hits in BLAST format.

type: boolean

Export alignments of hits for each node in BLAST format.\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MaltExtract: --matches

Turn on export of MEGAN summary files.

type: boolean

Export 'minimal' summary files (i.e. without alignments) that can be loaded into MEGAN6.\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Sets tool parameter(s):

  • MaltExtract: --meganSummary

Minimum percent identity alignments are required to have to be reported as candidate reads. Recommended to set same as MALT parameter.

type: number
default: 85

Minimum percent identity alignments are required to have to be reported. Higher values allows fewer mismatches between read and reference sequence, but therefore will provide greater confidence in the hit. Lower values allow more mismatches, which can account for damage and divergence of a related strain/species to the reference. Recommended to set same as MALT parameter or higher.\n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Modifies tool parameter(s):

  • MaltExtract: --minPI

Turn on using top alignments per read after filtering.

type: boolean

Use the best alignment of each read for every statistic, except for those concerning read distribution and coverage. \n\nOnly when --metagenomics_profiling_tool malt is also supplied.

Sets tool parameter(s):

  • MaltExtract: --useTopAlignment

Options for removal of PCR duplicates

Specify to skip the removal of PCR duplicates.

type: boolean

Specify which tool to use for deduplication.

type: string

Specify which duplicate read removal tool to use. While markduplicates is set by default, an ancient DNA specific read deduplication tool dedup is offered (see Peltzer et al. 2016 for details). The latter utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different).

⚠️ DeDup can only be used on collapsed (i.e. merged) reads from paired-end sequencing.

Options for filtering for, trimming or rescaling characteristic ancient DNA damage patterns

Specify to turn on damage rescaling of BAM files using mapDamage2 to probabilistically remove damage.

type: boolean

Specify to turn on mapDamage2's BAM rescaling functionality. This probabilistically replaces Ts back to Cs depending on the likelihood this reference-mismatch was originally caused by damage. If the library is specified to be single-stranded, this will automatically use the --single-stranded mode.
This process will ameliorate the effects of aDNA damage, but also increase reference-bias.

This functionality does not have any MultiQC output.
⚠️ Rescaled libraries will not be merged with non-scaled libraries of the same sample for downstream genotyping, as the model may be different for each library. If you wish to merge these, please do this manually and re-run nf-core/eager using the merged BAMs as input.

Modifies mapDamage2 parameter: --rescale

Specify the length of read sequence to use from each side for rescaling.

type: integer
default: 12

Specify the length in bp from the end of the read that mapDamage should rescale at both ends. This can be overridden by --rescalelength*p.

Modifies mapDamage2 parameter: --seq-length

Specify the length of read for mapDamage2 to rescale from 5 prime end.

type: integer

Specify the length in bp from the end of the read that mapDamage should rescale. This overrides --rescale_seqlength.

Modifies mapDamage2 parameter: --rescale-length-5p

Specify the length of read for mapDamage2 to rescale from 3 prime end.

type: integer

Specify the length in bp from the end of the read that mapDamage should rescale. This overrides --rescale_seqlength.

Modifies mapDamage2 parameter --rescale-length-3p

Specify to turn on PMDtools filtering.

type: boolean

Specify to run PMDtools for damage-based read filtering in sequencing libraries.

Specify PMD score threshold for PMDtools.

type: integer
default: 3

Specify the PMDScore threshold to use when filtering BAM files for DNA damage. Only reads which surpass this damage score are considered for downstream analysis.

Modifies PMDtools parameter: --threshold

Specify a masked FASTA file with positions to be used with PMDtools.

type: string
pattern: ^\S+\.fa?(\sta)$

Specify a FASTA file to use as reference for samtools calmd prior to PMD filtering.
Setting the SNPs that are part of the used capture set as N can alleviate reference bias when running PMD filtering on capture data, where you might not want the allele of a SNP to be counted as damage when it is a transition.

Specify a BED file to be used to mask the reference FASTA prior to running PMDtools.

type: string
pattern: ^\S+\.bed?(\.gz)$

Specify a BED file to activate masking of the reference FASTA at the contained sites prior to running PMDtools. Positions that are in the provided BED file will be replaced by Ns in the reference genome.
This can alleviate reference bias when running PMD filtering on capture data, where you might not want the allele of a transition SNP to be counted as damage. Masking of the reference is done using bedtools maskfasta.

Specify to turn on BAM trimming for non-UDG or half-UDG libraries.

type: boolean

Specify to turn on the BAM trimming of [n] bases from reads in the deduplicated BAM file. Damage assessment in PMDtools or DamageProfiler remains untouched, as data is routed through this independently. BAM trimming is typically performed to reduce errors during genotyping that can be caused by aDNA damage.

BAM trimming will only affect libraries with 'damage_treatment' of 'none' or 'half'. Complete UDG treatment ('full') should have removed all damage during library construction, so trimming of 0 bp is performed. The amount of bases that will be trimmed off from each side of the molecule should be set separately for libraries depending on their 'strandedness' and 'damage_treatment'.

Note: additional artefacts such as barcodes or adapters should be removed prior to mapping and not in this step.

Specify the number of bases to clip off reads from 'left' (5 prime) end of reads for double-stranded non-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'left' (5 prime) end of reads for double-stranded non-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the left side of reads from double-stranded libraries whose UDG treatment is set to 'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' (3 prime) end of reads for double-stranded non-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'right' (3 prime) end of reads for double-stranded non-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the right side of reads from double-stranded libraries whose UDG treatment is set to 'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Specify the number of bases to clip off reads from 'left' (5 prime) end of read for double-stranded half-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'left' (5 prime) end of read for double-stranded half-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the left side of reads from double-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' (3 prime) end of read for double-stranded half-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'right' (3 prime) end of read for double-stranded half-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the right side of reads from double-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Specify the number of bases to clip off reads from 'left' (5 prime) end of read for single-stranded non-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'left' (5 prime) end of read for single-stranded non-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the left side of reads from single-stranded libraries whose UDG treatment is set to 'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' (3 prime) end of read for single-stranded non-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'right' (3 prime) end of read for single-stranded non-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the right side of reads from single-stranded libraries whose UDG treatment is set to 'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Specify the number of bases to clip off reads from 'left' (5 prime) end of read for single-stranded half-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'left' (5 prime) end of read for single-stranded half-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the left side of reads from single-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' (3 prime) end of read for single-stranded half-UDG libraries.

type: integer

Specify the number of bases to clip off reads from 'right' (3 prime) end of read for single-stranded half-UDG libraries. By default, this is set to 0, and therefore clips off no bases on the right side of reads from single-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Specify to turn on soft-trimming instead of hard masking.

type: boolean

Specify to turn on soft-trimming instead of hard masking of bases. By default, nf-core/eager uses hard trimming, which sets trimmed bases to 'N' with quality '!' in the BAM output. Turn this on to use soft-trimming instead, which masks reads at the read ends using the CIGAR string instead.

Modifies bamUtil's trimBam parameter: -c

Options for variant calling

Specify to turn on genotyping of BAM files.

type: boolean

Specify to turn on genotyping. --genotyping_source and --genotyping_tool must also be provided together with this option.

Specify which input BAM to use for genotyping.

type: string

Specify which BAM file to use for genotyping, depending on what BAM processing modules you have turned on. Options are: 'raw' (to use the reads used as input for damage manipulation); 'pmd' (for pmdtools output); 'trimmed' (for base-clipped BAMs. Base-clipped-PMD-filtered BAMs if both filtering and trimming are requested); 'rescaled' (for mapDamage2 rescaling output).
Warning: Depending on the parameters you provided, 'raw' can refer to all mapped reads, filtered reads (if BAM filtering has been performed), or the deduplicated reads (if deduplication was performed).

Specify which genotyper to use.

type: string

Specify which genotyper to use. Current options are: pileupCaller, ANGSD, GATK UnifiedGenotyper (v3.5), GATK HaplotypeCaller (v4) or FreeBayes.

Note that while UnifiedGenotyper is more suitable for low-coverage ancient DNA (HaplotypeCaller does de novo assembly around each variant site), be aware GATK v3.5 it is officially deprecated by the Broad Institute (but is used here for compatibility with MultiVCFAnalyzer).

Specify to skip generation of VCF-based variant calling statistics with bcftools.

type: boolean

Specify to disable running of bcftools stats against VCF files from GATK and FreeBayes genotypers.

This will automatically include the FASTA reference for INDEL-related statistics.

Specify the ploidy of the reference organism.

type: integer
default: 2

Specify the desired ploidy value of your reference organism for genotyping with GATK or FreeBayes. E.g. if you want to allow heterozygous calls this value should be >= 2.

Modifies GATK UnifiedGenotyper parameter: --sample_ploidy
Modifies GATK HaplotypeCaller parameter: --sample-ploidy
Modifies FreeBayes parameter: -p

Specify the base mapping quality to be used for genotyping with pileupCaller.

type: integer
default: 30

Specify the minimum base quality to be used when generating the samtools mpileup used as input for genotyping with pileupCaller.

Modifies samtools mpileup parameter: -Q.

Specify the minimum mapping quality to be used for genotyping with pileupCaller.

type: integer
default: 30

Specify the minimum mapping quality to be used when generating the samtools mpileup used as input for genotyping with pileupCaller.

Modifies samtools mpileup parameter: -q.

Specify the path to SNP panel in BED format for pileupCaller.

type: string

Specify a SNP panel in the form of a BED file of sites at which to generate a pileup for pileupCaller.

Specify the path to SNP panel in EIGENSTRAT format for pileupCaller.

type: string

Specify a SNP panel in EIGENSTRAT format of sites to be called with pileupCaller.

Specify the SNP calling method to use for genotyping with pileupCaller.

type: string

Specify the SNP calling method to use for genotyping. 'randomHaploid' will randomly sample a read overlapping the SNP and produce a homozygous genotype with the allele supported by that read (often called 'pseudohaploid' or 'pseudodiploid'). 'randomDiploid` will randomly sample two reads overlapping the SNP and produce a genotype comprised of the two alleles supported by the two reads. 'majorityCall' will produce a genotype that is homozygous for the allele that appears in the majority of reads overlapping the SNP.

Modifies pileupCaller parameters: --randomHaploid --randomDiploid --majorityCall

Specify the calling mode for transitions with pileupCaller.

type: string

Specify if genotypes of transition SNPs should be called, set to missing, or excluded from the genotypes respectively.

Modifies pileupCaller parameter: --skipTransitions --transitionsMissing

Specify GATK phred-scaled confidence threshold.

type: integer
default: 30

Specify a GATK genotyper phred-scaled confidence threshold of a given SNP/INDEL call.

Modifies GATK UnifiedGenotyper or HaplotypeCaller parameter: -stand_call_conf

Specify VCF file for SNP annotation of output VCF files for GATK.

type: string
pattern: ^\S+\.vcf$

Specify VCF file for output VCF SNP annotation, e.g. if you want to annotate your VCF file with 'rs' SNP IDs. Check GATK documentation for more information. Gzip not accepted.

Specify the maximum depth coverage allowed for genotyping with GATK before down-sampling is turned on.

type: integer
default: 250

Specify the maximum depth coverage allowed for genotyping before down-sampling is turned on. Any position with a coverage higher than this value will be randomly down-sampled to this many reads.

Modifies GATK UnifiedGenotyper parameter: -dcov

Specify GATK UnifiedGenotyper output mode.

type: string

Specify GATK UnifiedGenotyper output mode to use when producing the output VCF (i.e. produce calls for every site or just confidence sites.)

Modifies GATK UnifiedGenotyper parameter: --output_mode

Specify UnifiedGenotyper likelihood model.

type: string

Specify GATK UnifiedGenotyper likelihood model, i.e. whether to call only SNPs or INDELS etc.

Modifies GATK UnifiedGenotyper parameter: --genotype_likelihoods_model

Specify to keep the BAM output of re-alignment around variants from GATK UnifiedGenotyper.

type: boolean

Specify to output the BAMs that have realigned reads (with GATK (v3) IndelRealigner) around possible variants for improved genotyping with GATK UnifiedGenotyper in addition to the standard VCF output.

These BAMs will be stored in the same folder as the corresponding VCF files.

Specify to supply a default base quality if a read is missing a base quality score.

type: integer
default: -1

Specify a value to set base quality scores for genotyping with GATK UnifiedGenotyper, if reads are missing this information. Might be useful if you have 'synthetically' generated reads (e.g. chopping up a reference genome). Default is set to -1 which is to not set any default quality (turned off).

Modifies GATK UnifiedGenotyper parameter: --defaultBaseQualities

Specify GATK HaplotypeCaller output mode.

type: string

Specify the type of sites that should be included in the output VCF after genotyping with GATK HaplotypeCaller (i.e. produce calls for every site or just confidence sites).

Modifies GATK HaplotypeCaller parameter: --output_mode

Specify HaplotypeCaller mode for emitting reference confidence calls.

type: string

Specify GATK HaplotypeCaller mode for emitting reference confidence calls.

Modifies GATK HaplotypeCaller parameter: --emit-ref-confidence

Specify minimum required supporting observations of an alternate allele to consider a variant in FreeBayes.

type: integer
default: 1

Specify the minimum count of observations supporting an alternate allele within a single individual in order to evaluate the position during genotyping with FreeBayes.

Modifies FreeBayes parameter: -C

Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified in FreeBayes.

type: integer

Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than the specified value during genotyping with FreeBayes. This is set to 0 by default, which deactivates this behaviour.

Modifies FreeBayes parameter: -g

Specify which ANGSD genotyping likelihood model to use.

type: string

Specify which genotype likelihood model to use in ANGSD.

Modifies ANGSD parameter: -GL

Specify the formatting of the output VCF for ANGSD genotype likelihood results.

type: string

Specifies what type of genotyping likelihood file format will be output by ANGSD.

The options refer to the following descriptions respectively:

  • binary: binary output of all 10 log genotype likelihood
  • beagle_binary: beagle likelihood file
  • binary_three: binary 3 times likelihood
  • text: text output of all 10 log genotype likelihoods.

See the ANGSD documentation for more information on which to select for your downstream applications.

Modifies ANGSD parameter: -doGlf

Options for the calculation of ratio of reads to one chromosome/FASTA entry against all others.

Specify to turn on mitochondrial to nuclear ratio calculation.

type: boolean

Specify to turn on estimation of the ratio of mitochondrial to nuclear reads.

Specify the name of the reference FASTA entry corresponding to the mitochondrial genome.

type: string
default: MT

Specify the FASTA entry in the reference file specified as --fasta, which acts as the mitochondrial 'chromosome' to base the ratio calculation on. The tool only accepts the first section of the header before the first space. The default chromosome name is based on hs37d5/GrCH37 human reference genome.

Options for the calculation of mapping statistics

Specify to turn off the computation of library complexity estimation with preseq.

type: boolean

Specify to turn off the computation of library complexity estimation.

Specify which mode of preseq to run.

type: string

Specify which mode of preseq to run.

From the preseq documentation:

c curve is used to compute the expected complexity curve of a mapped read file with a hypergeometric formula

lc extrap is used to generate the expected yield for theoretical larger experiments and bounds on the number of distinct reads in the library and the associated confidence intervals, which is computed by bootstrapping the observed duplicate counts histogram.

Specify the step size (i.e., sampling regularity) of preseq.

type: integer
default: 1000

Specify the step size of preseq's c_curve and lc_extrap methods. This can be useful when few reads are present and allow preseq to be used for extrapolation of shallow sequencing results.

Modifies preseq parameter:
-s

Specify the maximum number of terms that preseq's lc_extrap mode will use.

type: integer
default: 100

Specify the maximum number of terms that preseq's lc_extrap mode will use.

Modifies preseq lc_extrap parameter: -x

Specify the maximum extrapolation to use for preseq's lc_extrap mode.

type: integer
default: 10000000000

Specify the maximum extrapolation that preseq's lc_extrap mode will perform.

Modifies preseq lc_extrap parameter: -e

Specify number of bootstraps to perform in preseq's lc_extrap mode.

type: integer
default: 100

Specify the number of bootstraps preseq's lc_extrap mode will perform to calculate confidence intervals.

Modifies preseq lc_extrap parameter: -n

Specify confidence interval level for preseq's lc_extrap mode.

type: number
default: 0.95

Specify the allowed level of confidence intervals used for prerseq's lc_extrap mode.

Modifies preseq lc_extrap parameter: -c

Specify to turn on preseq defects mode to extrapolate without testing for defects in lc_extrap mode.

type: boolean

Specify to activate defects mode of preseq lc_extrap, which runs the extrapolation without testing for defects.

Modifies preseq lc_extrap parameter: -D

Specify to turn off coverage calculation with Qualimap.

type: boolean

Specify path to SNP capture positions in BED format for coverage calculations with Qualimap.

type: string

Options for calculating and filtering for characteristic ancient DNA damage patterns.

Specify to turn off ancient DNA damage calculation.

type: boolean

Specify to turn off computation of DNA damage profiles.

Specify the tool to use for damage calculation.

type: string

Specify the tool to be used for damage calculation. DamageProfiler is generally faster than mapDamage2, but the latter has an option to limit the number of reads used. This can significantly speed up the processing of very large files, where the damage estimates are already accurate after processing only a fraction of the input.

Specify the maximum misincorporation frequency that should be displayed on damage plot.

type: number
default: 0.3

Specify the maximum misincorporation frequency that should be displayed in the damage plot.

Modifies DamageProfiler parameter: -yaxis_dp_max or mapDamage2 parameter: --ymax

Specify number of bases of each read to be considered for plotting damage estimation.

type: integer
default: 25

Specify the number of bases to be considered for plotting nucleotide misincorporations.

Modifies DamageProfiler parameter: -t or mapDamage2 parameter: -m

Specify the length filter for DamageProfiler.

type: integer
default: 100

Specify the number of bases which are considered for frequency computations.

Modifies DamageProfiler parameter: -l

Specify the maximum number of reads to consider for damage calculation with mapDamage.

type: integer

Specify the maximum number of reads used for damage calculation in mapDamage2. This can be used to significantly reduce the amount of time required for damage assessment. Note that a too low value can also obtain incorrect results.

Modifies mapDamage2 parameter: -n

Options for calculating reference annotation statistics (e.g. gene coverages)

Specify to turn on calculation of number of reads, depth and breadth coverage of features in reference with bedtools.

type: boolean

Specify to turn on the bedtools module, producing statistics for breadth (or percent coverage), and depth (or X fold) coverages.

Modifies bedtools coverage parameter: -mean

Specify path to GFF or BED file containing positions of features in reference file for bedtools.

type: string

Specify the path to a GFF/BED containing the feature coordinates (or any acceptable input for bedtools coverage). Must be in quotes.

Options for removing host-mapped reads

Specify to turn on creation of pre-adapter-removal and/or read-pair-merging FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data).

type: boolean

Specify to recreate pre-adapter-removal and/or read-pair-merging FASTQ files but without reads that mapped to reference (e.g. for public upload of privacy-sensitive non-host data)

Specify the host-mapped read removal mode.

type: string

Specify the host-mapped read removal mode.

Modifies extract_map_reads.py parameter: -m

Options for the estimation of contamination in human data

Specify to turn on nuclear contamination estimation for genomes with ANGSD.

type: boolean

Specify to run nuclear DNA contamination estimation with ANGSD.

Specify the name of the chromosome to be used for contamination estimation with ANGSD.

type: string
default: X

Specify the name of the chromosome to be used for contamination estimation with ANGSD as specified in your FASTA/BAM header, e.g. 'X' for hs37d5 or 'chrX' for hg19

Specify the first position on the chromosome to be used for contamination estimation with ANGSD.

type: integer
default: 5000000

Specify the beginning of the genetic range that should be utilised for nuclear contamination estimation with ANGSD.

Specify the last position on the chromosome to be used for contamination estimation with ANGSD.

type: integer
default: 154900000

Specify the end of the genetic range that should be utilised for nuclear contamination estimation with ANGSD.

Specify the minimum mapping quality reads should have for contamination estimation with ANGSD.

type: integer
default: 30

Specify the minimum mapping quality reads should have for contamination estimation with ANGSD.

Modifies ANGSD parameter: -minMapQ

Specify the minimum base quality reads should have for contamination estimation with ANGSD.

type: integer
default: 30

Specify the minimum base quality reads should have for contamination estimation with ANGSD.

Modifies ANGSD parameter: -minQ

Specify path to HapMap file of chromosome for contamination estimation with ANGSD.

type: string
default: ${projectDir}/assets/angsd_resources/HapMapChrX.gz

Specify a path to HapMap file of chromosome for contamination estimation with ANGSD. The haplotype map, or "HapMap", records the location of haplotype blocks and their tag SNPs.

Options for the calculation of genetic sex of human individuals.

Specify to turn on sex determination for genomes mapped to human reference genomes with Sex.DetERRmine.

type: boolean

Specify to run genetic sex determination.

Specify path to SNP panel in BED format for error bar calculation.

type: string

Specify a BED file with SNPs to be used for X-/Y-rate calculation. Running without this parameter will considerably increase runtime, and render the resulting error bars untrustworthy. Theoretically, any set of SNPs that are distant enough that two SNPs are unlikely to be covered by the same read can be used here. The programme was coded with the 1240k panel in mind.