nf-core/eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline

adnaancient-dna-analysisancientdnagenomemetagenomicspathogen-genomicspopulation-genetics

This is the development version of the pipeline.

This pipeline uses DSL1. It will not work with Nextflow versions after 22.10.6. Learn more.

Launch development version https://github.com/nf-core/eager

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.(c|t)sv$

You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See usage docs.

Specify to convert input BAM files back to FASTQ for remapping

type: boolean

This parameter tells the pipeline to convert the BAM files listed in the --input TSV or CSV sheet back to FASTQ format to allow re-preprocessing and mapping

Can be useful when you want to ensure consistent mapping parameters across all libraries when incorporating public data, however be careful of biases that may come from re-processing again (the BAM files may already be clipped, or only mapped reads with different settings are included so you may not have all reads from the original publication).

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Reference genome related files and options required for the workflow.

Path to FASTA genome file.

type: string

pattern: ^\S+\.fn?a(sta)?(\.gz)?$|^\S+\.(c|t)sv$

This parameter is mandatory if --genome is not specified. If you don't supply a mapper index (e.g. for BWA), this will be generated for you automatically. Combine with --save_reference to save mapper index for future runs.

Path to samtools FASTA index (typically ending in '.fai'). If not supplied will be made for you.

type: string

If you want to use a pre-existing samtools faidx index, use this to specify the required FASTA index file for the selected reference genome. This should be generated by samtools faidx and has a file suffix of .fai.

Path to picard sequence dictionary file (typically ending in '.dict'). If not supplied will be made for you.

type: string

If you want to use a pre-existing picard CreateSequenceDictionary dictionary file, use this to specify the required .dict file for the selected reference genome.

Path to directory containing index files of the FASTA for a given mapper.

type: string

For most people this will likely be the same directory that contains the file you provided to --fasta.

If you want to use pre-existing bwa index indices, the directory should contain files ending in '.amb' '.ann' '.bwt'. If you want to use pre-existing bowtie2 build indices, the directory should contain files ending in'.1.bt2', '.2.bt2', '.rev.1.bt2'.

In any case do not include the files themselves in the path. nf-core/eager will automagically detect the index files by searching for the FASTA filename with the corresponding bwa index/bowtie2 build file suffixes. If not supplied, the indices will be generated for you.

Specify to save any pipeline-generated reference genome indices in the results directory.

type: boolean

Use this if you do not have pre-made reference FASTA indices for bwa, samtools and picard. If you turn this on, the indices nf-core/eager generates for you and will be saved in the <your_output_dir>/results/reference_genomes for you. If not supplied, nf-core/eager generated index references will be deleted.

modifies SAMtools index command: -c

Name of iGenomes reference.

hidden

type: string

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Directory / URL base for iGenomes references.

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Do not load the iGenomes reference config.

hidden

type: boolean

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

Specify the FASTA header of the target chromosome to extend. Only applies when using circularmapper.

type: string

The entry (chromosome, contig, etc.) in your FASTA reference that you'd like to be treated as circular.

Applies only when providing a single FASTA file via --fasta (NOT multi-reference input - see reference TSV/CSV input).

Modifies tool parameter(s):

circulargenerator -s

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+\.?\s*(s|m|h|d|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Show all params when using --help

hidden

type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Validation of parameters fails when an unrecognised parameter is found.

hidden

type: boolean

By default, when an unrecognised parameter is found, it returns a warinig.

Validation of parameters in lenient more.

hidden

type: boolean

Allows string values that are parseable as numbers or booleans. For further information see JSONSchema docs.

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Removal of adapters, paired-end merging, poly-G removal etc.

Specify which tool to use for sequencing quality control.

type: string

Specify which tool to use for sequencing quality control.

Falco is designed as a drop-in replacement for FastQC but written in C++ for faster computation. We recommend using falco with very large datasets (due to reduced memory constraints).

Specify to skip all preprocessing steps (adapter removal, paired-end merging, poly-g trimming etc).

type: boolean

Specify to skip all preprocessing steps (adapter removal, paired-end merging, poly-g trimming etc).

This will also mean you will only get one set of FastQC results (of the input reads).

Specify which preprocessing tool to use.

type: string

Specify which preprocessing tool to use.

AdapterRemoval is commonly used in palaeogenomics, however fastp has similar performance and has many additional functionality (including inbuilt complexity trimming) that can be often useful.

Specify to skip read-pair merging.

type: boolean

Turns off the paired-end read merging, and will result in paired-end mapping modes being used during reference of reads again alignment.

This can be useful in cases where you have long ancient DNA reads, modern DNA, or when you want to utilise mate-pair 'spatial' information..

⚠️ If you run this and also with --preprocessing_minlength set to a value (as is by default!), you may end up removing single reads from either the pair1 or pair2 file. These reads will be NOT be mapped when aligning with either bwa or bowtie, as both can only accept one (forward) or two (forward and reverse) FASTQs as input in paired-end mode.

⚠️ If you run metagenomic screening as well as skipping merging, all reads will be screened as independent reads - not as pairs! - as all FASTQ files from BAM filtering are merged into one. This merged file is not saved in results directory.

Modifies AdapterRemoval parameter: --collapse
Modifies fastp parameter: --merge

Specify to exclude pairs that did not overlap sufficiently for merging (i.e., keep merged reads only).

type: boolean

Specify to exclude pairs that did not overlap sufficiently for merging (i.e., keep merged reads only), in otherwords singletons (i.e. reads missing a pair), or un-merged reads (where there wasn't sufficient overlap) are discarded.

Most ancient DNA molecules are very short, and the majority are expected to merge. Specifying this parameter can sometimes be useful when dealing with ultra-short aDNA reads to reduce the number of longer-reads you may have in your library that are derived from modern contamination. It can also speed up run time of mapping steps.

You may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality and/or are longer). It is highly recommended when using 'dedup' deduplication tool.

Specify to skip removal of adapters.

type: boolean

Specify to turn off trimming of adapters from reads.

You may wish to do this if you are using public data (e.g. ENA, SRA), that should have all library artefacts from reads.

This will override any other adapter parameters provided (i.e, --preprocessing_adapterlist and or/ --preprocessing_adapter{1,2} will be ignored)!

Modifies AdapterRemoval parameter: --adapter1 and --adapter2 (sets both to an empty string)
Applies fastp parameter: --disable_adapter_trimming

Specify the nucleotide sequence for the forward read/R1.

type: string

Specify a nucleotide sequence for the forward read/R1.

If not modified by the user, the default for the particular preprocessing tool will be used. Therefore, to turn off adapter trimming use --preprocessing_skipadaptertrim.

Modifies AdapterRemoval parameter: --adapter1
Modifies fastp parameter: --adapter_sequence

Specify the nucleotide sequence for the forward read/R2.

type: string

Specify a nucleotide sequence for the forward read/R2.

If not modified by the user, the default for the particular preprocessing tool will be used. To turn off adapter trimming use --preprocessing_skipadaptertrim.

Modifies AdapterRemoval parameter: --adapter2
Modifies fastp parameter: --adapter_sequence_r2

Specify a list of all possible adapters to trim. Overrides --preprocessing_adapter1/2. Formats: .txt (AdapterRemoval) or .fasta. (fastp).

type: string

Allows to supply a file with a list of adapter (combinations) to remove from all files.

Overrides the --preprocessing_adapter1/--preprocessing_adapter2 parameters .

Note that the two tools have slightly different behaviours.

For AdapterRemoval this consists of a two column table with a .txt extension: first column represents forward strand, second column for reverse strand. You must supply all possibly combinations, one per line, and this list is applied to all files. Only Adapters in this list will be screened for and removed. See AdapterRemoval documentation for more information.

For fastp this consists of a standard FASTA format with a .fasta/.fa/.fna/.fas extension. The adapter sequence in this file should be at least 6bp long, otherwise it will be skipped. fastp first will perform auto-detection of reads and will be removed , and then additionally adapters present in the FASTA file one by one will be removed.

Modifies AdapterRemoval parameter: --adapter-list
Modifies fastp parameter: --adapter_fasta

Specify the minimum length reads must have to be retained.

type: integer

default: 25

Specify the minimum length reads must have to be retained.

Reads smaller than this length after trimming are discarded and not included in downstream analyses. Typically in ancient DNA users will set this to 30 or for very old samples around 25 bp - reads any shorter that this often are not specific enough to provide useful information.

Modifies AdapterRemoval parameter: --minlength
Modifies fastp parameter: --length_required

Specify number of bases to hard-trim from 5 prime or front of reads. Exact behaviour varies per tool, see documentation.

type: integer

Specify number of bases to hard-trim from 5 prime or front of reads. Exact behaviour varies per tool, see documentation. By default set to 0 to not perform any hard trimming.

This parameter allows users to 'hard' remove a number of bases from the beginning or end of reads, regardless of quality.

⚠️ when this trimming occurs depends on the tool, i.e., the exact behaviour is not the same between AdapterRemoval and fastp.

For fastp: this 5p/3p trimming occurs prior to any other trimming (quality, poly-G, adapter). Please see the fastp documentation for more information. If you wish to use this to remove damage prior mapping (to allow more specific mapping), ensure you have manually removed adapters/quality trimmed prior to giving the reads to nf-core/eager. Alternatively, you can use Bowtie2's inbuilt pre-mapping read-end trimming functionality. Note that nf-core/eager only allows this hard trimming equally for both forward and reverse reads (i.e., you cannot provide different values for the 5p end for R1 and R2).

For AdapterRemoval, this trimming happens after the removal of adapters, however prior to quality trimming. Therefore this is more suitable for hard-removal of damage prior mapping (however the Bowtie2 system will be more reliable).

Modifies AdapterRemoval parameters: --trim5p
Modifies fastp parameters: --trim_front1 and/or --trim_front2

Specify number of bases to hard-trim from 3 prime or tail of reads. Exact behaviour varies per tool, see documentation.

type: integer

Specify number of bases to hard-trim from 3 prime or tail of reads. Exact behaviour varies per tool, see documentation. By default set to 0 to not perform any hard trimming.

This parameter allows users to 'hard' remove a number of bases from the beginning or end of reads, regardless of quality.

⚠️ when this trimming occurs depends on the tool, i.e., the exact behaviour is not the same between AdapterRemoval and fastp.

Modifies AdapterRemoval parameters: --trim3p
Modifies fastp parameters: --trim_tail1 and/or --trim_tail2

Specify to save the preprocessed reads in the results directory.

type: boolean

Specify to save the preprocessed reads in FASTQ format the results directory.

This can be useful for re-analysing in FASTQ files manually, or uploading to public data repositories such as ENA/SRA (provided you don't do length filtering nor merging).

Specify to turn on sequence complexity filtering of reads with fastp.

type: boolean

Performs a poly-G tail removal step in the beginning of the pipeline using fastp.

This can be useful for trimming ploy-G tails from short-fragments sequenced on two-colour Illumina chemistry such as NextSeqs or NovaSeqs (where no-fluorescence is read as a G on two-colour chemistry), which can inflate reported GC content values.

Modifies fastp parameter: --trim_poly_g

Specify the complexity threshold that must be reached or exceeded to retain reads.

type: integer

default: 10

This option can be used to define the minimum length of a poly-G tail to begin low complexity trimming.

Modifies fastp parameter: --poly_g_min_len

Skip AdapterRemoval base trimming (n, quality) of 5 prime end.

type: boolean

Turns off quality based trimming at the 5p end of reads when any of the AdapterRemoval quality or N trimming options are used. Only 3p end of reads will be removed.

This also entirely disables quality based trimming of collapsed reads, since both ends of these are informative for PCR duplicate filtering. For more information see the AdapterRemoval documentation.

Modifies AdapterRemoval parameters: --preserve5p

Skip AdapterRemoval quality and N trimming from ends of reads.

type: boolean

Turns off AdapterRemoval quality trimming from ends of reads.

This can be useful to reduce runtime when running public data that has already been processed.

Modifies AdapterRemoval parameters: --trimqualities

Specify AdapterRemoval minimum base quality for trimming off bases.

type: integer

default: 20

Defines the minimum read quality per base that is required for a base to be kept. Individual bases at the ends of reads falling below this threshold will be clipped off.

Modifies AdapterRemoval parameter: --minquality

Skip AdapterRemoval N trimming (quality trimming only).

type: boolean

Turns off AdapterRemoval N trimming from ends of reads.

This can be useful to reduce runtime when running public data that has already been processed.

Modifies AdapterRemoval parameters: --trimns

Specify the AdapterRemoval minimum adapter overlap required for trimming.

type: integer

default: 1

Specifies a minimum number of bases that overlap with the adapter sequence before AdapterRemoval trims adapters sequences from reads.

Modifies AdapterRemoval parameter: --minadapteroverlap

Specify the AdapterRemoval maximum Phred score used in input FASTQ files

type: integer

default: 41

Specify maximum Phred score of the quality field of FASTQ files.

The quality-score range can vary depending on the machine and version (e.g. see diagram here, and this allows you to increase from the default AdapterRemoval value of 41.

Note that while this theoretically can provide you with more confident and precise base call information, many downstream tools only accept FASTQ files with Phred scores limited to a max of 41, and therefore increasing the default for this parameter may make the resulting preprocessed files incompatible with some downstream tools.

Modifies AdapterRemoval parameters: --qualitymax

Options for aligning reads against reference genome(s)

Turn on FastQ sharding.

type: boolean

Sharding will split the FastQs into smaller chunks before mapping. These chunks are then mapped in parallel. This approach can speed up the mapping process for larger FastQ files.

Specify the number of reads in each shard when splitting.

type: integer

default: 1000000

Make sure to choose a value that makes sense for your dataset. Small values can create many files, which can end up negatively affecting the overall speed of the mapping process.

Specify which mapper to use.

type: string

Specify which mapping tool to use. Options are BWA aln ('bwaaln'), BWA mem ('bwamem'), circularmapper ('circularmapper'), or bowtie2 ('bowtie2'). BWA aln is the default and highly suited for short-read ancient DNA. BWA mem can be quite useful for modern DNA, but is rarely used in projects for ancient DNA. CircularMapper enhances the mapping procedure to circular references, using the BWA algorithm but utilizing a extend-remap procedure (see Peltzer et al 2016, Genome Biology for details). Bowtie2 is similar to BWA aln, and has recently been suggested to provide slightly better results under certain conditions (Poullet and Orlando 2020), as well as providing extra functionality (such as FASTQ trimming).

More documentation can be seen for each tool under:

Specify to generate more recent '.csi' BAM indices. If your reference genome is larger than 3.5GB, this is recommended due to more efficient data handling with the '.csi' format over the older '.bai'.

type: boolean

This parameter is required to be set for large reference genomes. If your reference genome is larger than 3.5GB, the samtools index calls in the pipeline need to generate .csi indices instead of .bai indices to compensate for the size of the reference genome (with samtools: -c). This parameter is not required for smaller references (including the human hg19 or grch37/grch38 references), but >4GB genomes have been shown to need .csi indices.

Specify the -n parameter for BWA aln, i.e. amount of allowed mismatches in the alignment.

type: number

default: 0.01

Configures the bwa aln -n parameter, defining how many mismatches are allowed in a read. Default is set following recommendations from Oliva et al. 2021 who tested when aligning to human reference genomes.

If you're uncertain what to set check out this Shiny App for more information on how to set this parameter efficiently.

Modifies bwa aln parameter: -n

Specify the -k parameter for BWA aln, i.e. maximum edit distance allowed in a seed.

type: integer

default: 2

Configures the bwa aln -k parameter for the maximum edit distance during the seeding phase of the mapping algorithm.

Modifies BWA aln parameter: -k

Specify the -l parameter for BWA aln i.e. the length of seeds to be used.

type: integer

default: 1024

Configures the length of the seed used in bwa aln -l. Default is set to be 'turned off' at the recommendation of Oliva et al. 2021 who tested when aligning to human reference genomes. Seeding is 'turned off' by specifying an arbitrarily long number to force the entire read to act as the seed.

Note: Despite being recommended, turning off seeding can result in long runtimes!

Modifies BWA aln parameter: -l

Specify the -o parameter for BWA aln i.e. the number of gaps allowed.

type: integer

default: 2

Configures the number of gaps used in bwa aln. Default is set to bwa default.

Modifies BWA aln parameter: -o

Specify the -k parameter for BWA mem i.e. the minimum seed length

type: integer

default: 19

Configures the minimum seed length used in BWA-MEM. Default is set to BWA default.

Modifies BWA-MEM parameter: -k

Specify the -k parameter for BWA mem i.e. the minimum seed length

type: number

default: 1.5

Configures the re-seeding used in BWA-MEM. Default is set to BWA default.

Modifies BWA-MEM parameter: -r

Specify the bowtie2 alignment mode.

type: string

The type of read alignment to use. Local allows only partial alignment of read, with ends of reads possibly 'soft-clipped' (i.e. remain unaligned/ignored), if the soft-clipped alignment provides best alignment score. End-to-end requires all nucleotides to be aligned.
Default is set following Cahill et al (2018) and Poullet and Orlando 2020

Modifies Bowtie2 presets: --local, --end-to-end

Specify the level of sensitivity for the bowtie2 alignment mode.

type: string

The Bowtie2 'preset' to use. These strings apply to both --mapping_bowtie2_alignmode options. See the Bowtie2 manual for actual settings.
Default is set following Poullet and Orlando (2020), when running damaged-data without UDG treatment)

Modifies the Bowtie2 parameters: --fast, --very-fast, --sensitive, --very-sensitive, --fast-local, --very-fast-local, --sensitive-local, --very-sensitive-local

Specify the -N parameter for bowtie2 (mismatches in seed). This will override defaults from alignmode/sensitivity.

type: integer

The number of mismatches allowed in the seed during seed-and-extend procedure of Bowtie2. This will override any values set with --mapping_bowtie2_sensitivity. Can either be 0 or 1.

Modifies Bowtie2 parameter: -N

Specify the -L parameter for bowtie2 (length of seed substrings). This will override defaults from alignmode/sensitivity.

type: integer

default: 20

The length of the seed sub-string to use during seeding. This will override any values set with --mapping_bowtie2_sensitivity.

Modifies Bowtie2 parameter: -L

Specify number of bases to trim off from 5' (left) end of read before alignment.

type: integer

Number of bases to trim at the 5' (left) end of read prior alignment. Maybe useful when left-over sequencing artefacts of in-line barcodes present.

Modifies Bowtie2 parameter: --trim5

Specify number of bases to trim off from 3' (right) end of read before alignment.

type: integer

Number of bases to trim at the 3' (right) end of read prior alignment. Maybe useful when left-over sequencing artefacts of in-line barcodes present.

Modifies Bowtie2 parameter: --trim3

Specify the maximum fragment length for Bowtie2 paired-end mapping mode only.

type: integer

default: 500

The maximum fragment for valid paired-end alignments. Only for paired-end mapping (i.e. unmerged), and therefore typically only useful for modern data.

Modifies Bowtie2 parameter: --maxins

Options related to length, quality, and map status filtering of reads.

Turn on filtering of reads in BAM files after mapping. By default, only mapped reads retained.

type: boolean

Turns on the filtering subworkflow for mapped BAM files coming out of the read alignment step. Filtering includes removal of unmapped reads, length filtering, and mapping quality filtering.

When turning on bam filtering, by default only the mapped/unmapped filter is activated, thus only mapped reads are retained for downstream analyses. See --bamfiltering_retainunmappedgenomicbam to retain unmapped reads, if filtering only for length and/or quality is preferred.

Note this subworkflow can also be activated if --run_metagenomic_screening is supplied.

Specify the minimum read length mapped reads should have for downstream genomic analysis.

type: integer

You can use this to remove mapped reads that fall below a certain length after mapping.

This can be useful to get more realistic 'endogenous DNA' or 'on target read' percentages.

If used instead of minimum length read filtering at AdapterRemoval, you can get more more realistic endogenous DNA estimates when most of your reads are very short (e.g. in single-stranded libraries or samples with highly degraded DNA). In these cases, the default minimum length filter at earlier adapter clipping/read merging will remove a very large amount of your reads in your library (including valid reads), thus making an artificially small denominator for a typical endogenous DNA calculation.

Therefore by retaining all of your reads until after mapping (i.e., turning off the adapter clipping/read merging filter), you can generate more 'real' endogenous DNA estimates immediately after mapping (with a better denominator). Then after estimating this, filter using this parameter to retain only 'useful' reads (i.e., those long enough to provide higher confidence of their mapped position) for downstream analyses.

By specifying 0, no length filtering is performed.

Note that by default the output BAM files of this step are not stored in the results directory (as it is assumed that deduplicated BAM files are preferred). See --bamfiltering_savefilteredbams if you wish to save these.

Modifies tool parameter(s):

filter_bam_fragment_length.py: -l

Specify the minimum mapping quality reads should have for downstream genomic analysis.

type: integer

Specify a mapping quality threshold for mapped reads to be kept for downstream analysis.

By default all reads are retained and is therefore set to 0 to ensure no quality filtering is performed.

Modifies tool parameter(s):

samtools view -q

Specify the SAM format flag of reads to remove during BAM filtering for downstream genomic steps. Generally not recommended to change.

type: integer

default: 4

You can use this to customise the exact SAM format flag of reads you wish to remove from your BAM file to for downstream genomic analyses.

You can explore more using a tool from the Broad institute here

⚠️ Modify at your own risk, alternative flags are not necessarily supported in downstream steps!

Modifies tool parameter(s):
- SAMtools: -F

Specify to retain unmapped reads in the BAM file used for downstream genomic analyses.

type: boolean

You can use this parameter to retain unmapped reads (optionally also length filtered) in the genomic BAM for downstream analysis. By default, the pipeline only keeps mapped reads for downstream analysis.

This is also turned on if --metagenomicscreening_input is set to all.

⚠️ This will likely slow down run time of downstream pipeline steps!

Modifies tool parameter(s):

samtools view: -f 4 / -F 4

Generate FASTQ files containing only unmapped reads from the aligner generated BAM files.

type: boolean

This turns on the generation and saving of FASTQs of only the unmapped reads from the mapping step in the results directory, using samtools fastq.

This could be useful if you wish to do other analysis of the unmapped reads independently of the pipeline.

Note: the reads in these FASTQ files have not undergone length of quality filtering

Modifies tool parameter(s):

samtools fastq: -f 4

Generate FASTQ files containing only mapped reads from the aligner generated BAM files .

type: boolean

This turns on the generation and saving of FASTQs of only the mapped reads from the mapping step in the results directory, using samtools fastq.

This could be useful if you wish to do other analysis of the mapped reads independently of the pipeline, such as remapping with different parameters (whereby only including mapped reads will speed up computation time during the re-mapping due to reduced input data).

Note the reads in these FASTQ files have not undergone length of quality filtering

Modifies tool parameter(s):

samtools fastq: -F 4

Save in the results directory the intermediate filtered genomic BAM files that are sent for downstream genomic analyses.

type: boolean

This saves intermediate length and/or quality filtered genomic BAM files in the results directory.

Options to related to metagenomic screening.

Turn on metagenomic screening of mapped, unmapped, or all reads.

type: boolean

Turns on the metagenomic screening subworkflow of the pipeline, where reads are screened against large databases. Typically used for pathogen screening or microbial community analysis.

If supplied, this will also turn on the BAM filtering subworkflow of the pipeline.

Specify which type of reads to go into metagenomic screening.

type: string

You can select which reads coming out of the read alignment step will be sent for metagenomic analysis.

This influences which reads are sent to this step, whether you want unmapped reads (used in most cases, as 'host reads' can often be contaminants in microbial genomes), mapped reads (e.g, when doing competitive against a genomic reference of multiple genomes and which to apply LCA correction), or all reads.

⚠️ If you skip paired-end merging, all reads will be screened as independent reads - not as pairs! - as all FASTQ files from BAM filtering are merged into one. This merged file is not saved in results directory.

Modifies tool parameter(s):

samtools fastq: -f 4 / -F 4

Run a complexity filter on the metagenomics input files before classification. Specifiy the tool to use with the metagenomics_complexity_tool parameter, save with metagenomics_complexity_savefastq

type: boolean

Turns on a subworkflow of the pipeline that filters the fastq-files for complexity before the metagenomics profiling
Use the metagenomics_complexity_tool parameter to select a method

Save FASTQ files containing the complexity filtered reads (before metagenomic classification).

type: boolean

Save the complexity-filtered fastq-files to the results directory

Specify which tool to use for trimming, filtering, or reformatting of fastq reads that go into metagenomics screening.

type: string

You can select which tool is used to generate a final set of reads for the metagenomic classifier after any necessary trimming, filtering or reformatting of the reads.

This intermediate file is not saved in the results directory, unless marked with --metagenomics_complexity_savefastq.

Specify the entropy threshold that under which a sequencing read will be complexity filtered out. This should be between 0-1.

type: number

default: 0.3

Specify the minimum 'entropy' value for complexity filtering for the BBDuk or PRINSEQ++ tools.

This value will only be used for PRINSEQ++ if --metagenomics_prinseq_mode is set to entropy.

Entropy here corresponds to the amount of sequence variation exists within the read. Higher values correspond to more variety, and thus will likely result in more specific matching to a taxon's reference genome. The trade off here is fewer reads (or abundance information) available for having a confident identification.

Modifies tool parameter(s):

BBDuk: entropy=

PRINSEQ++: -lc_entropy

Specify the complexity filter mode for PRINSEQ++

type: string

Specify the complexity filter mode for PRINSEQ++

Use the selected mode together with the correct flag:
'dust' requires the --metagenomics_prinseq_dustscore parameter set
'entropy' requires the --metagenomics_complexity_entropy parameter set

Sets one of the tool parameter(s):

PRINSEQ++: -lc_entropy

PRINSEQ++: -lc_dust

Specify the minimum dust score for PRINTSEQ++ complexity filtering

type: number

default: 0.5

Specify the minimum dust score below which low-complexity reads will be removed. A DUST score is based on how often different tri-nucleotides occur along a read.

Modifies tool parameter(s):

PRINSEQ++: --lc_dust

Options for removal of PCR duplicates

Specify to skip the removal of PCR duplicates.

type: boolean

Specify which tool to use for deduplication.

type: string

Sets the duplicate read removal tool. Alternatively an ancient DNA specific read deduplication tool dedup (Peltzer et al. 2016) is offered. The latter utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different).

⚠️ DeDup can only be used on collapsed (i.e. merged) reads from paired-end sequencing.

Options for filtering for, trimming or rescaling characteristic ancient DNA damage patterns

Turn on damage rescaling of BAM files using mapDamage2 to probabilistically remove damage.

type: boolean

Turns on mapDamage2's BAM rescaling functionality. This probabilistically replaces Ts back to Cs depending on the likelihood this reference-mismatch was originally caused by damage. If the library is specified to be single stranded, this will automatically use the --single-stranded mode.
This process will ameliorate the effects of aDNA damage, but also increase reference-bias.

This functionality does not have any MultiQC output.
warning: rescaled libraries will not be merged with non-scaled libraries of the same sample for downstream genotyping, as the model may be different for each library. If you wish to merge these, please do this manually and re-run nf-core/eager using the merged BAMs as input.

Modifies the --rescale parameter of mapDamage2

Length of read sequence to use from each side for rescaling. Can be overridden by --rescalelength*p.

type: integer

default: 12

Specify the length in bp from the end of the read that mapDamage should rescale at both ends.

Modifies the --seq-length parameter of mapDamage2.

Length of read for mapDamage2 to rescale from 5p end. Only used if not 0, otherwise --rescale_seqlength used.

type: integer

Specify the length in bp from the end of the read that mapDamage should rescale. Overrides --rescale_seqlength.

Modifies the --rescale-length-5p parameter of mapDamage2.

Length of read for mapDamage2 to rescale from 3p end. Only used if not 0 otherwise --rescale_seqlength used.

type: integer

Specify the length in bp from the end of the read that mapDamage should rescale. Overrides --rescale_seqlength.

Modifies the --rescale-length-3p parameter of mapDamage2.

Turn on PMDtools filtering.

type: boolean

Specifies to run PMDtools for damage based read filtering in sequencing libraries.

Specify PMDScore threshold for PMDtools.

type: integer

default: 3

Specifies the PMDScore threshold to use in the pipeline when filtering BAM files for DNA damage. Only reads which surpass this damage score are considered for downstream DNA analysis.

Modifies PMDtools parameter: --threshold

Specify a masked FASTA file with positions to be used with pmdtools.

type: string

pattern: ^\S+\.fa?(\sta)$

Supplying a FASTA file will use this file as reference for samtools calmd prior to PMD filtering. /nSetting the SNPs that are part of the used capture set as N can alleviate reference bias when running PMD filtering on capture data, where you might not want the allele of a SNP to be counted as damage when it is a transition.

Specify a bedfile to be used to mask the reference fasta prior to running pmdtools.

type: string

pattern: ^\S+\.bed?(\.gz)$

Supplying a bedfile to this parameter activates masking of the reference fasta at the contained sites prior to running PMDtools. Positions that are in the provided bedfile will be replaced by Ns in the reference genome.
This can alleviate reference bias when running PMD filtering on capture data, where you might not want the allele of a transition SNP to be counted as damage. Masking of the reference is done using bedtools maskfasta.

Turn on BAM trimming. Will only affect non-UDG or half-UDG libraries.

type: boolean

Turns on the BAM trimming method. Trims off [n] bases from reads in the deduplicated BAM file. Damage assessment in PMDtools or DamageProfiler remains untouched, as data is routed through this independently. BAM trimming is typically performed to reduce errors during genotyping that can be caused by aDNA damage.

BAM trimming will only affect libraries with 'damage_treatment' of 'none' or 'half'. Complete UDG treatment ('full') should have removed all damage, during library construction so trimming of 0 bp is performed. The amount of bases that will be trimmed off from each side of the molecule should be set separately for libraries with depending on their 'strandedness' and 'damage_treatment'.

Note: additional artefacts such as bar-codes or adapters should be removed prior to mapping and not in this step.

Specify the number of bases to clip off reads from 'left' end of read for double-stranded non-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the left side of reads from double-stranded libraries whose UDG treatment is set to'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' end of read for double-stranded non-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the right side of reads from double-stranded libraries whose UDG treatment is set to'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Specify the number of bases to clip off reads from 'left' end of read for double-stranded half-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the left side of reads from double-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' end of read for double-stranded half-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the right side of reads from double-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Specify the number of bases to clip off reads from 'left' end of read for single-stranded non-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the left side of reads from single-stranded libraries whose UDG treatment is set to 'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' end of read for single-stranded non-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the right side of reads from single-stranded libraries whose UDG treatment is set to 'none'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Specify the number of bases to clip off reads from 'left' end of read for single-stranded half-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the left side of reads from single-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L

Specify the number of bases to clip off reads from 'right' end of read for single-stranded half-UDG libraries.

type: integer

Default is set to 0, and therefore clips off no bases on the right side of reads from single-stranded libraries whose UDG treatment is set to 'half'. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -R

Turn on using soft-trimming instead of hard masking.

type: boolean

By default, nf-core/eager uses hard trimming, which sets trimmed bases to 'N' with quality '!' in the BAM output. Turn this on to use soft-trimming instead, which masks reads at the read ends using the CIGAR string instead.

Modifies bam trimBam parameter: -c

Options for variant calling

Turn on genotyping of BAM files.

type: boolean

Turns on genotyping. --genotyping_source and --genotyping_tool must also be provided together with this option.

Specify which input BAM to use for genotyping.

type: string

Indicates which BAM file to use for genotyping, depending on what BAM processing modules you have turned on. Options are: 'raw' (to use the reads used as input for damage manipulation); 'pmd' (for pmdtools output); 'trimmed' (for base-clipped BAMs. Base-clipped-PMD-filtered BAMs if both filtering and trimming are requested); 'rescaled' (for mapDamage2 rescaling output).
Warning: Depending on the parameters you provided, 'raw' can refer to all mapped reads, filtered reads (if bam filtering has been performed), or the deduplicated reads (if deduplication was performed).

Specify which genotyper to use between: GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller.

type: string

Specifies which genotyper to use. Current options are: GATK (v3.5) UnifiedGenotyper or GATK Haplotype Caller (v4); and the FreeBayes Caller.

Note that while UnifiedGenotyper is more suitable for low-coverage ancient DNA (HaplotypeCaller does de novo assembly around each variant site), be aware GATK 3.5 it is officially deprecated by the Broad Institute (but is used here for compatibility with MultiVCFAnalyzer).

Skip bcftools stats generation for VCF based variant calling statistics

type: boolean

Disables running of bcftools stats against VCF files from GATK and FreeBayes genotypers.

If ran, bcftools stats will automatically include the FASTA reference for INDEL-related statistics.

Specify the ploidy of the reference organism.

type: integer

default: 2

Specify the desired ploidy value of your reference organism for genotyping with GATK or FreeBayes. E.g. if you want to allow heterozygous calls this value should be >= 2.

Modifies GATK UnifiedGenotyper parameter: --sample_ploidy
Modifies GATK HaplotypeCaller parameter: --sample-ploidy
Modifies FreeBayes parameter: -p

The base mapping quality to be used for genotyping with pileupcaller.

type: integer

default: 30

The minimum base quality to be used when generating the samtools mpileup used as input for genotyping with pileupCaller.

Modifies samtools mpileup parameter: -Q.

The minimum mapping quality to be used for genotyping with pileupcaller.

type: integer

default: 30

The minimum mapping quality to be used when generating the samtools mpileup used as input for genotyping with pileupCaller.

Modifies samtools mpileup parameter: -q.

Specify the path to SNP panel in bed format for pileupCaller.

type: string

Specify a SNP panel in the form of a bed file of sites at which to generate a pileup for pileupCaller.

Specify the path to SNP panel in EIGENSTRAT format for pileupCaller.

type: string

Specify a SNP panel in EIGENSTRAT format, pileupCaller will call these sites.

Specify the SNP calling method to use for genotyping.

type: string

Specify the SNP calling method to use for genotyping. 'randomHaploid' will randomly sample a read overlapping the SNP, and produce a homozygous genotype with the allele supported by that read (often called 'pseudohaploid' or 'pseudodiploid'). 'randomDiploid` will randomly sample two reads overlapping the SNP and produce a genotype comprised of the two alleles supported by the two reads. 'majorityCall' will produce a genotype that is homozygous for the allele that appears in the majority of reads overlapping the SNP.

Modifies pileupCaller parameters: --randomHaploid --randomDiploid --majorityCall

Specify the calling mode for transitions.

type: string

Specify if genotypes of transition SNPs should be called, set to missing, or excluded from the genotypes respectively.

Modifies pileupCaller parameter: --skipTransitions --transitionsMissing

Specify GATK phred-scaled confidence threshold.

type: integer

default: 30

If selected, specify a GATK genotyper phred-scaled confidence threshold of a given SNP/INDEL call.

Modifies GATK UnifiedGenotyper or HaplotypeCaller parameter: -stand_call_conf

Specify VCF file for SNP annotation of output VCF files. Optional. Gzip not accepted.

type: string

pattern: ^\S+\.vcf$

(Optional) Specify VCF file for output VCF SNP annotation e.g. if you want to annotate your VCF file with 'rs' SNP IDs. Check GATK documentation for more information. Gzip not accepted.

Maximum depth coverage allowed for genotyping before down-sampling is turned on.

type: integer

default: 250

Maximum depth coverage allowed for genotyping before down-sampling is turned on. Any position with a coverage higher than this value will be randomly down-sampled to this many reads.

Modifies GATK UnifiedGenotyper parameter: -dcov

Specify GATK output mode.

type: string

If GATK UnifiedGenotyper is selected as the genotyping tool, this defines the output mode to use when producing the output VCF (i.e. produce calls for every site or just confidence sites.)

Modifies GATK UnifiedGenotyper parameter: --output_mode

Specify UnifiedGenotyper likelihood model.

type: string

If GATK UnifiedGenotyper is selected as the genotyping tool, this sets which likelihood model to follow, i.e. whether to call only SNPs or INDELS etc.

Modifies GATK UnifiedGenotyper parameter: --genotype_likelihoods_model

Specify to keep the BAM output of re-alignment around variants from GATK UnifiedGenotyper.

type: boolean

If GATK UnifiedGenotyper is selected as the genotyping tool, providing this parameter will output the BAMs that have realigned reads (with GATK's (v3) IndelRealigner) around possible variants for improved genotyping in addition to the standard VCF output.

These BAMs will be stored in the same folder as the corresponding VCF files.

Supply a default base quality if a read is missing a base quality score. Setting to -1 turns this off.

type: integer

default: -1

If GATK UnifiedGenotyper is selected as the genotyping tool, specify a value to set base quality scores, if reads are missing this information. Might be useful if you have 'synthetically' generated reads (e.g. chopping up a reference genome). Default is set to -1 which is to not set any default quality (turned off).

Modifies GATK UnifiedGenotyper parameter: --defaultBaseQualities

Specify GATK output mode.

type: string

If GATK HaplotypeCaller is selected as the genotyping tool, this sets the type of sites that should be included in the output VCF (i.e. produce calls for every site or just confidence sites).

Modifies GATK HaplotypeCaller parameter: --output_mode

Specify HaplotypeCaller mode for emitting reference confidence calls.

type: string

If GATK HaplotypeCaller is selected as the genotyping tool, this sets the mode for emitting reference confidence calls.

Modifies GATK HaplotypeCaller parameter: --emit-ref-confidence

Specify minimum required supporting observations of an alternate allele to consider a variant.

type: integer

default: 1

Require at least this count of observations supporting an alternate allele within a single individual in order to evaluate the position.

Modifies freebayes parameter: -C

Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified.

type: integer

Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than the specified value. Setting to 0 (the default) deactivates this behaviour.

Modifies freebayes parameter: -g

Specify which ANGSD genotyping likelihood model to use.

type: string

Specify which genotype likelihood model to use.

Modifies angsd parameter: -GL

Specify the formatting of the output VCF for ANGSD genotype likelihood results.

type: string

Specifies what type of genotyping likelihood file format will be output.

The options refer to the following descriptions respectively:

binary: binary output of all 10 log genotype likelihood
beagle_binary: beagle likelihood file
binary_three: binary 3 times likelihood
text: text output of all 10 log genotype likelihoods.

See the ANGSD documentation for more information on which to select for your downstream applications.

Modifies angsd parameter: -doGlf

Options for the calculation of ratio of reads to one chromosome/FASTA entry against all others.

Turn on mitochondrial to nuclear ratio calculation.

type: boolean

Turn on the module to estimate the ratio of mitochondrial to nuclear reads.

Specify the name of the reference FASTA entry corresponding to the mitochondrial genome (up to the first space).

type: string

default: MT

Specify the FASTA entry in the reference file specified as --fasta, which acts as the mitochondrial 'chromosome' to base the ratio calculation on. The tool only accepts the first section of the header before the first space. The default chromosome name is based on hs37d5/GrCH37 human reference genome.

Turns off the computation of library complexity estimation.

type: boolean

Turns off the computation of library complexity estimation.

Specify which mode of preseq to run.

type: string

Specify which mode of preseq to run.

From the PreSeq documentation:

c curve is used to compute the expected complexity curve of a mapped read file with a hypergeometric formula

lc extrap is used to generate the expected yield for theoretical larger experiments and bounds on the number of distinct reads in the library and the associated confidence intervals, which is computed by bootstrapping the observed duplicate counts histogram.

Specify the step size (i.e., sampling regularity) of Preseq.

type: integer

default: 1000

Can be used to configure the step size of Preseq's c_curve and lc_extrap method. Can be useful when few reads and allow Preseq to be used for extrapolation of shallow sequencing results.

Modifies tool parameter(s)

preseq: -s

Specify the maximum number of terms that lc_extrap mode will use.

type: integer

default: 100

Specify the maximum number of terms that lc_extrap mode will use.

Modifies preseq lc_extrap parameter: -x

Specify the maximum extrapolation (lc_extrap mode only)

type: integer

default: 10000000000

Specify the maximum extrapolation that lc_extrap mode will perform.

Modifies preseq lc_extrap parameter: -e

Specify number of bootstraps to perform (lc_extrap mode only)

type: integer

default: 100

Specify the number of bootstraps lc_extrap mode will perform to calculate confidence intervals.

Modifies preseq lc_extrap parameter: -n

Specify confidence interval level (lc_extrap mode only)

type: number

default: 0.95

Specify the allowed level of confidence intervals used for lc_extrap mode.

Modifies preseq lc_extrap parameter: -c

Turns on defects mode to extrapolate without testing for defects (lc_extrap mode only).

type: boolean

Activates defects mode of lc_extrap, which does the extrapolation without testing for defects.

Modifies preseq lc_extrap parameter: -D

type: boolean

Path to snp capture in BED format. Provided file can also be gzipped.

type: string

Options for calculating and filtering for characteristic ancient DNA damage patterns.

type: boolean

Turns off damage calculation to compute DNA damage profiles.

Specify the tool to use for damage calculation.

type: string

Specify the tool to be used for damage calculation. DamageProfiler is generally faster than mapDamage2, but the latter has an option to limit the number of reads used. This can significantly speed up the processing of very large files, where the damage estimates are already accurate after processing only a fraction of the input.

Specify the maximum misincorporation frequency that should be displayed on damage plot. Set to 0 to 'autoscale'.

type: number

default: 0.3

Specifies what the maximum misincorporation frequency should be displayed as, in the damage plot.

Modifies DamageProfiler parameter: -yaxis_dp_max or mapDamage2 parameter: --ymax

Specify number of bases of each read to be considered for plotting damage estimation.

type: integer

default: 25

Specifies the number of bases to be considered for plotting nucleotide misincorporations.

Modifies DamageProfiler parameter: -t or mapDamage2 parameter: -m

Specifies the length filter for DamageProfiler.

type: integer

default: 100

Number of bases which are considered for frequency computations, by default set to 100.`

Modifies DamageProfiler parameter: -l

Specify the maximum number of reads to consider for damage calculation. Defaults value is 0 (i.e. no downsampling is performed).

type: integer

The maximum number of reads used for damage calculation in mapDamage2. Can be used to significantly reduce the amount of time required for damage assessment. Note that a too low value can also obtain incorrect results.

Modifies mapDamage2 parameter: -n

Options for getting reference annotation statistics (e.g. gene coverages)

Turn on ability to calculate no. reads, depth and breadth coverage of features in reference.

type: boolean

Specifies to turn on the bedtools module, producing statistics for breadth (or percent coverage), and depth (or X fold) coverages.

Modifies tool parameter(s):

bedtools coverage: -mean

Path to GFF or BED file containing positions of features in reference file (--fasta). Path should be enclosed in quotes.

type: string

Specify the path to a GFF/BED containing the feature coordinates (or any acceptable input for bedtools coverage). Must be in quotes.

Turn on per-lane creation of pre-adapter-removal and/or read-pair-merging FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)

type: boolean

Recreates pre-adapter-removal and/or read-pair-merging FASTQ files but without reads that mapped to reference (e.g. for public upload of privacy-sensitive non-host data)

Host-mapped read removal mode. Remove mapped reads completely from FASTQ (remove) or just mask the host sequence of mapped reads with N (replace).

type: string

Modifies extract_map_reads.py parameter: -m

Options for the estimation of contamination

Turn on nuclear contamination estimation for genomes with ANGSD.

type: boolean

Specify to run the optional processes for nuclear DNA contamination estimation with ANGSD.

The name of the chromosome to be used for contamination estimation.

type: string

default: X

The name of the chromosome as specified in your FASTA/bam header.
e.g. 'X' for hs37d5, 'chrX' for HG19

The first position on the chromosome to be used for contamination estimation with ANGSD.

type: integer

default: 5000000

The beginning of the genetic range that should be utilised for nuclear contamination estimation.

The last position on the chromosome to be used for contamination estimation with ANGSD.

type: integer

default: 154900000

The end of the genetic range that should be utilised for nuclear contamination estimation.

Specify the minimum mapping quality reads should have for contamination estimation with ANGSD.

type: integer

default: 30

Modifies angsd parameter: -minMapQ

Specify the minimum base quality reads should have for contamination estimation with ANGSD.

type: integer

default: 30

Modifies angsd parameter: -minQ

Path to HapMap file of chromosome for contamination estimation..

type: string

default: ${projectDir}/assets/angsd_resources/HapMapChrX.gz

The haplotype map, or "HapMap", records the location of haplotype blocks and their tag SNPs.

Options for the calculation of biological sex of human individuals.

Turn on sex determination for human reference genomes. This will run on single- and double-stranded variants of a library separately.

type: boolean

Specify to run the optional process of sex determination.

Specify path to SNP panel in bed format for error bar calculation. Optional (see documentation).

type: string

Specify an optional bedfile of the list of SNPs to be used for X-/Y-rate calculation. Running without this parameter will considerably increase runtime, and render the resulting error bars untrustworthy. Theoretically, any set of SNPs that are distant enough that two SNPs are unlikely to be covered by the same read can be used here. The programme was coded with the 1240K panel in mind.