eager: Parameters

Define where the pipeline should find input data, and additional metadata.

Either paths or URLs to FASTQ/BAM data (must be surrounded with quotes). For paired end data, the path must use '{1,2}' notation to specify read pairs. Alternatively, a path to a TSV file (ending .tsv) containing file paths and sequencing/sample metadata. Allows for merging of multiple lanes/libraries/samples. Please see documentation for template.

required

type: string

There are two possible ways of supplying input sequencing data to nf-core/eager. The most efficient but more simplistic is supplying direct paths (with wildcards) to your FASTQ or BAM files, with each file or pair being considered a single library and each one run independently (e.g. for paired-end data: --input '/<path>/<to>/*_{R1,R2}_*.fq.gz'). TSV input requires creation of an extra file by the user (--input '/<path>/<to>/eager_data.tsv') and extra metadata, but allows more powerful lane and library merging. Please see usage docs for detailed instructions and specifications.

Specifies whether you have UDG treated libraries. Set to 'half' for partial treatment, or 'full' for UDG. If not set, libraries are assumed to have no UDG treatment ('none'). Not required for TSV input.

type: string

Defines whether Uracil-DNA glycosylase (UDG) treatment was used to remove DNA
damage on the sequencing libraries.

Specify 'none' if no treatment was performed. If you have partial UDG treated
data (Rohland et al 2016), specify
'half'. If you have complete UDG treated data (Briggs et al.
2010), specify 'full'.

When also using PMDtools specifying 'half' will use a different model for DNA
damage assessment in PMDTools (PMDtools: --UDGhalf). Specify 'full' and the
PMDtools DNA damage assessment will use CpG context only (PMDtools: --CpG).
Default: 'none'.

Tip: You should provide a small decoy reference genome with pre-made indices, e.g.
the human mtDNA genome, for the mandatory parameter --fasta in order to
avoid long computational time for generating the index files of the reference
genome, even if you do not actually need a reference genome for any downstream
analyses.

Specifies that libraries are single stranded. Always affects MALTExtract but will be ignored by pileupCaller with TSV input. Not required for TSV input.

type: boolean

Indicates libraries are single stranded.

Currently only affects MALTExtract where it will switch on damage patterns
calculation mode to single-stranded, (MaltExtract: --singleStranded) and
genotyping with pileupCaller where a different method is used (pileupCaller:
--singleStrandMode). Default: false

Only required when using the 'Path' method of --input

Specifies that the input is single end reads. Not required for TSV input.

type: boolean

By default, the pipeline expects paired-end data. If you have single-end data, specify this parameter on the command line when you launch the pipeline. It is not possible to run a mixture of single-end and paired-end files in one run.

Only required when using the 'Path' method of --input

Specifies which Illumina sequencing chemistry was used. Used to inform whether to poly-G trim if turned on (see below). Not required for TSV input. Options: 2, 4.

type: integer

default: 4

Specifies which Illumina colour chemistry a library was sequenced with. This informs whether to perform poly-G trimming (if --complexity_filter_poly_g is also supplied). Only 2 colour chemistry sequencers (e.g. NextSeq or NovaSeq) can generate uncertain poly-G tails (due to 'G' being indicated via a no-colour detection). Default is '4' to indicate e.g. HiSeq or MiSeq platforms, which do not require poly-G trimming. Options: 2, 4. Default: 4

Only required when using the 'Path' method of input.

Specifies that the input is in BAM format. Not required for TSV input.

type: boolean

Specifies the input file type to --input is in BAM format. This will automatically also apply --single_end.

Only required when using the 'Path' method of --input.

Additional options regarding input data.

If library result of SNP capture, path to BED file containing SNPS positions on reference genome. SNP statistics are qualimap results directory only not MultiQC.

type: string

Can be used to set a path to a BED file (3/6 column format) of SNP positions of a reference genome, to calculate SNP captured libraries on-target efficiency. This should be used for array or in-solution SNP capture protocols such as 390K, 1240K, etc. If supplied, some on-target metrics are automatically generated for you by qualimap in the 'Globals inside' section of the 'genome_results.txt' file in the qualimap results directory. These statistics are currently NOT displayed in MultiQC!

Turns on conversion of an input BAM file into FASTQ format to allow re-preprocessing (e.g. AdapterRemoval etc.).

type: boolean

Allows you to convert an input BAM file back to FASTQ for downstream processing. Note this is required if you need to perform AdapterRemoval and/or polyG clipping.

If not turned on, BAMs will automatically be sent to post-mapping steps.

Specify locations of references and optionally, additional pre-made indices

Path or URL to a FASTA reference file (required if not iGenome reference). File suffixes can be: '.fa', '.fn', '.fna', '.fasta'.

type: string

You specify the full path to your reference genome here. The FASTA file can have any file suffix, such as .fasta, .fna, .fa, .FastA etc. You may also supply a gzipped reference files, which will be unzipped automatically for you.

For example:

--fasta '/<path>/<to>/my_reference.fasta'

If you don't specify appropriate --bwa_index, --fasta_index parameters, the pipeline will create these indices for you automatically. Note that you can save the indices created for you for later by giving the --save_reference flag.
You must select either a --fasta or --genome

Name of iGenomes reference (required if not FASTA reference). Requires argument --igenomes_ignore false, as iGenomes is ignored by default in nf-core/eager

type: string

Alternatively to --fasta, the pipeline config files come bundled with paths to the Illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the AWS-iGenomes resource.

There are 31 different species supported in the iGenomes references. To run the pipeline, you must specify which to use with the --genome flag.

You can find the keys to specify the genomes in the iGenomes config file. Common genomes that are supported are:

Human
- --genome GRCh37
- --genome GRCh38
Mouse *
- --genome GRCm38
Drosophila *
- --genome BDGP6
S. cerevisiae *
- --genome 'R64-1-1'

* Not bundled with nf-core eager by default.

Note that you can use the same configuration setup to save sets of reference files for your own use, even if they are not part of the iGenomes resource. See the Nextflow documentation for instructions on where to save such a file.

The syntax for this reference configuration is as follows:

params {  
  genomes {  
    'GRCh37' {  
      fasta   = '<path to the iGenomes genome fasta file>'  
    }  
    // Any number of additional genomes, key is used with --genome  
  }  
}  
**NB** Requires argument `--igenomes_ignore false` as iGenomes ignored by default in nf-core/eager

Directory / URL base for iGenomes references.

hidden

type: string

default: s3://ngi-igenomes/igenomes

Do not load the iGenomes reference config.

hidden

type: boolean

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

Path to directory containing pre-made BWA indices (i.e. the directory before the files ending in '.amb' '.ann' '.bwt'. Do not include the files themselves. Most likely the same directory of the file provided with --fasta). If not supplied will be made for you.

type: string

If you want to use pre-existing bwa index indices, please supply the directory to the FASTA you also specified in --fasta nf-core/eager will automagically detect the index files by searching for the FASTA filename with the corresponding bwa index file suffixes.

For example:

nextflow run nf-core/eager \  
-profile test,docker \  
--input '*{R1,R2}*.fq.gz'  
--fasta 'results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta' \  
--bwa_index 'results/reference_genome/bwa_index/BWAIndex/'

bwa index does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command must not be changed, otherwise nf-core/eager will not be able to find them.

Path to directory containing pre-made Bowtie2 indices (i.e. everything before the endings e.g. '.1.bt2', '.2.bt2', '.rev.1.bt2'. Most likely the same value as --fasta). If not supplied will be made for you.

type: string

If you want to use pre-existing bt2 index indices, please supply the directory to the FASTA you also specified in --fasta. nf-core/eager will automagically detect the index files by searching for the FASTA filename with the corresponding bt2 index file suffixes.

For example:

nextflow run nf-core/eager \  
-profile test,docker \  
--input '*{R1,R2}*.fq.gz'  
--fasta 'results/reference_genome/bwa_index/BWAIndex/Mammoth_MT_Krause.fasta' \  
--bwa_index 'results/reference_genome/bt2_index/BT2Index/'

bowtie2-build does not give you an option to supply alternative suffixes/names for these indices. Thus, the file names generated by this command must not be changed, otherwise nf-core/eager will not be able to find them.

Path to samtools FASTA index (typically ending in '.fai'). If not supplied will be made for you.

type: string

If you want to use a pre-existing samtools faidx index, use this to specify the required FASTA index file for the selected reference genome. This should be generated by samtools faidx and has a file suffix of .fai

For example:

--fasta_index 'Mammoth_MT_Krause.fasta.fai'

Path to picard sequence dictionary file (typically ending in '.dict'). If not supplied will be made for you.

type: string

If you want to use a pre-existing picard CreateSequenceDictionary dictionary file, use this to specify the required .dict file for the selected reference genome.

For example:

--seq_dict 'Mammoth_MT_Krause.dict'

Specify to generate more recent '.csi' BAM indices. If your reference genome is larger than 3.5GB, this is recommended due to more efficient data handling with the '.csi' format over the older '.bai'.

type: boolean

This parameter is required to be set for large reference genomes. If your
reference genome is larger than 3.5GB, the samtools index calls in the
pipeline need to generate CSI indices instead of BAI indices to compensate
for the size of the reference genome (with samtools: -c). This parameter is
not required for smaller references (including the human hg19 or
grch37/grch38 references), but >4GB genomes have been shown to need CSI
indices. Default: off

If not already supplied by user, turns on saving of generated reference genome indices for later re-usage.

type: boolean

Use this if you do not have pre-made reference FASTA indices for bwa, samtools and picard. If you turn this on, the indices nf-core/eager generates for you and will be saved in the <your_output_dir>/results/reference_genomes for you. If not supplied, nf-core/eager generated index references will be deleted.

modifies SAMtools index command: -c

Specify where to put output files and optional saving of intermediate files

The output directory where the results will be saved.

type: string

default: ./results

The output directory where the results will be saved. By default will be made in the directory you run the command in under ./results.

Method used to save pipeline results to output directory.

hidden

type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed.

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run if it fails. Normally would be the same as in --email but can be different. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

Note that this functionality requires either mail or sendmail to be installed on your system.

Send plain-text email instead of HTML.

hidden

type: boolean

Set to receive plain-text e-mails instead of HTML formatted.

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

If file generated by pipeline exceeds the threshold, it will not be attached.

Do not use coloured log outputs.

hidden

type: boolean

Set to disable colourful command line output and live life in monochrome.

Custom config file to supply to MultiQC.

hidden

type: string

Directory to keep pipeline Nextflow logs and reports.

hidden

type: string

default: ${params.outdir}/pipeline_info

Show all params when using --help

hidden

type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Parameter used for checking conda channels to be set correctly.

hidden

type: boolean

String to specify ignored parameters for parameter validation

hidden

type: string

default: genomes

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+\.?\s*(s|m|h|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Parameters used to describe centralised config profiles. These generally should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Provide git commit id for custom Institutional configs hosted at nf-core/configs. This was implemented for reproducibility purposes. Default: master.

## Download and use config file with following git commit id  
--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the custom_config_base option. For example:

## Download and unzip the config files  
cd /path/to/my/configs  
wget https://github.com/nf-core/configs/archive/master.zip  
unzip master.zip  
  
## Run the pipeline  
cd /path/to/my/data  
nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/

Note that the nf-core/tools helper package has a download command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.

Institutional configs hostname.

hidden

type: string

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

The AWSBatch JobQueue that needs to be set when running on AWSBatch

type: string

The AWS Region for your AWS Batch job to run on

type: string

default: eu-west-1

Path to the AWS CLI tool

type: string

Skip any of the mentioned steps.

type: boolean

Turns off FastQC pre- and post-Adapter Removal, to speed up the pipeline. Use of this flag is most common when data has been previously pre-processed and the post-Adapter Removal mapped reads are being re-mapped to a new reference genome.

type: boolean

Turns off adapter trimming and paired-end read merging. Equivalent to setting both --skip_collapse and --skip_trim.

type: boolean

Turns off the computation of library complexity estimation.

type: boolean

Turns off duplicate removal methods DeDup and MarkDuplicates respectively. No duplicates will be removed on any data in the pipeline.

type: boolean

Turns off the DamageProfiler module to compute DNA damage profiles.

type: boolean

Turns off QualiMap and thus does not compute coverage and other mapping metrics.

Processing of Illumina two-colour chemistry data.

Turn on running poly-G removal on FASTQ files. Will only be performed on 2 colour chemistry machine sequenced libraries.

type: boolean

Performs a poly-G tail removal step in the beginning of the pipeline using fastp, if turned on. This can be useful for trimming ploy-G tails from short-fragments sequenced on two-colour Illumina chemistry such as NextSeqs (where no-fluorescence is read as a G on two-colour chemistry), which can inflate reported GC content values.

Specify length of poly-g min for clipping to be performed.

type: integer

default: 10

This option can be used to define the minimum length of a poly-G tail to begin low complexity trimming. By default, this is set to a value of 10 unless the user has chosen something specifically using this option.

Modifies fastp parameter: --poly_g_min_len

Options for adapter clipping and paired-end merging.

Specify adapter sequence to be clipped off (forward strand).

type: string

default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

Defines the adapter sequence to be used for the forward read. By default, this is set to 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'.

Modifies AdapterRemoval parameter: --adapter1

Specify adapter sequence to be clipped off (reverse strand).

type: string

default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

Defines the adapter sequence to be used for the reverse read in paired end sequencing projects. This is set to 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA' by default.

Modifies AdapterRemoval parameter: --adapter2

Path to AdapterRemoval adapter list file. Overrides --clip_*_adaptor parameters

type: string

Allows to supply a file with a list of adapter (combinations) to remove from all files. Overrides the --clip_*_adaptor parameters . First column represents forward strand, second column for reverse strand. You must supply all possibly combinations, one per line, and this list is applied to all files. See AdapterRemoval documentation for more information.

Modifies AdapterRemoval parameter: --adapter-list

Specify read minimum length to be kept for downstream analysis.

type: integer

default: 30

Defines the minimum read length that is required for reads after merging to be considered for downstream analysis after read merging. Default is 30.

Note that when you have a large percentage of very short reads in your library (< 20 bp) - such as retrieved in single-stranded library protocols - that performing read length filtering at this step is not always reliable for correct endogenous DNA calculation. When you have very few reads passing this length filter, it will artificially inflate your 'endogenous DNA' value by creating a very small denominator.

If you notice you have ultra short reads (< 20 bp), it is recommended to set this parameter to 0, and use --bam_filter_minreadlength instead, to filter out 'un-usable' short reads after mapping. A caveat, however, is that this will cause a very large increase in computational run time, due to all reads in the library will be being mapped.

Modifies AdapterRemoval parameter: --minlength

Specify minimum base quality for trimming off bases.

type: integer

default: 20

Defines the minimum read quality per base that is required for a base to be kept. Individual bases at the ends of reads falling below this threshold will be clipped off. Default is set to 20.

Modifies AdapterRemoval parameter: --minquality

Specify minimum adapter overlap required for clipping.

type: integer

default: 1

Specifies a minimum number of bases that overlap with the adapter sequence before adapters are trimmed from reads. Default is set to 1 base overlap.

Modifies AdapterRemoval parameter: --minadapteroverlap

Skip of merging forward and reverse reads together and turns on paired-end alignment for downstream mapping. Only applicable for paired-end libraries.

type: boolean

Turns off the paired-end read merging.

For example

--skip_collapse  --input '*_{R1,R2}_*.fastq'

It is important to use the paired-end wildcard globbing as --skip_collapse can only be used on paired-end data!

⚠️ If you run this and also with --clip_readlength set to something (as is by default), you may end up removing single reads from either the pair1 or pair2 file. These will be NOT be mapped when aligning with either bwa or bowtie, as both can only accept one (forward) or two (forward and reverse) FASTQs as input.

Also note that supplying this flag will then also cause downstream mapping steps to run in paired-end mode. This may be more suitable for modern data, or when you want to utilise mate-pair spatial information.

Modifies AdapterRemoval parameter: --collapse

Skip adapter and quality trimming.

type: boolean

Turns off adapter AND quality trimming.

For example:

--skip_trim  --input '*.fastq'

⚠️ it is not possible to keep quality trimming (n or base quality) on,
and skip adapter trimming.

⚠️ it is not possible to turn off one or the other of quality
trimming or n trimming. i.e. --trimns --trimqualities are both given
or neither. However setting quality in --clip_min_read_quality to 0 would
theoretically turn off base quality trimming.

Modifies AdapterRemoval parameters: --trimns --trimqualities --adapter1 --adapter2

Skip quality base trimming (n, score, window) of 5 prime end.

type: boolean

Turns off quality based trimming at the 5p end of reads when any of the --trimns, --trimqualities, or --trimwindows options are used. Only 3p end of reads will be removed.

This also entirely disables quality based trimming of collapsed reads, since both ends of these are informative for PCR duplicate filtering. Described here.

Modifies AdapterRemoval parameters: --preserve5p

Only use merged reads downstream (un-merged reads and singletons are discarded).

type: boolean

Specify that only merged reads are sent downstream for analysis.

Singletons (i.e. reads missing a pair), or un-merged reads (where there wasn't sufficient overlap) are discarded.

You may want to use this if you want ensure only the best quality reads for your analysis, but with the penalty of potentially losing still valid data (even if some reads have slightly lower quality). It is highly recommended when using --dedupper 'dedup' (see below).

Specify the maximum Phred score used in input FASTQ files

type: integer

default: 41

Specify maximum Phred score of the quality field of FASTQ files. The quality-score range can vary depending on the machine and version (e.g. see diagram here, and this allows you to increase from the default AdapterRemoval value of 41.

Modifies AdapterRemoval parameters: --qualitymax

Turn on trimming of inline barcodes (i.e. internal barcodes after adapter removal)

type: boolean

In some cases, you may want to additionally trim reads in a FASTQ file after adapter removal.

This could be to remove short 'inline' or 'internal' barcodes that are ligated directly onto DNA molecules prior ligation of adapters and indicies (the former of which allow ultra-multiplexing and/or checks for barcode hopping).

In other cases, you may wish to already remove known high-frequency damage bases to allow stricter mapping.

Turning on this module uses fastp to trim one, or both ends of a merged read, or in cases where you have not collapsed your read, R1 and R2.

Specify the number of bases to trim off the front of a merged read or R1

type: integer

default: 7

Specify the number of bases to trim off the start of a read in a merged- or forward read FASTQ file.

Modifies fastp parameters: --trim_front1

Specify the number of bases to trim off the tail of of a merged read or R1

type: integer

default: 7

Specify the number of bases to trim off the end of a read in a merged- or forward read FASTQ file.

Modifies fastp parameters: --trim_tail1

Specify the number of bases to trim off the front of R2

type: integer

default: 7

Specify the number of bases to trim off the start of a read in an unmerged forward read (R1) FASTQ file.

Modifies fastp parameters: --trim_front2

Specify the number of bases to trim off the tail of R2

type: integer

default: 7

Specify the number of bases to trim off the end of a read in an unmerged reverse read (R2) FASTQ file.

Modifies fastp parameters: --trim_tail2

Options for reference-genome mapping

Specify which mapper to use. Options: 'bwaaln', 'bwamem', 'circularmapper', 'bowtie2'.

type: string

Specify which mapping tool to use. Options are BWA aln ('bwaaln'), BWA mem ('bwamem'), circularmapper ('circularmapper'), or bowtie2 (bowtie2). BWA aln is the default and highly suited for short-read ancient DNA. BWA mem can be quite useful for modern DNA, but is rarely used in projects for ancient DNA. CircularMapper enhances the mapping procedure to circular references, using the BWA algorithm but utilizing a extend-remap procedure (see Peltzer et al 2016, Genome Biology for details). Bowtie2 is similar to BWA aln, and has recently been suggested to provide slightly better results under certain conditions (Poullet and Orlando 2020), as well as providing extra functionality (such as FASTQ trimming). Default is 'bwaaln'

More documentation can be seen for each tool under:

Specify the -n parameter for BWA aln, i.e. amount of allowed mismatches in the alignment.

type: number

default: 0.01

Configures the bwa aln -n parameter, defining how many mismatches are allowed in a read. By default set to 0.04 (following recommendations of Schubert et al. (2012 BMC Genomics)), if you're uncertain what to set check out this Shiny App for more information on how to set this parameter efficiently.

Modifies bwa aln parameter: -n

Specify the -k parameter for BWA aln, i.e. maximum edit distance allowed in a seed.

type: integer

default: 2

Configures the bwa aln -k parameter for the seeding phase in the mapping algorithm. Default is set to 2.

Modifies BWA aln parameter: -k

Specify the -l parameter for BWA aln i.e. the length of seeds to be used.

type: integer

default: 1024

Configures the length of the seed used in bwa aln -l. Default is set to be 'turned off' at the recommendation of Schubert et al. (2012 BMC Genomics) for ancient DNA with 1024.

Note: Despite being recommended, turning off seeding can result in long runtimes!

Modifies BWA aln parameter: -l

Specify the -o parameter for BWA aln i.e. the number of gaps allowed.

type: integer

default: 2

Configures the number of gaps used in bwa aln. Default is set to bwa default.

Modifies BWA aln parameter: -o

Specify the number of bases to extend reference by (circularmapper only).

type: integer

default: 500

The number of bases to extend the reference genome with. By default this is set to 500 if not specified otherwise.

Modifies circulargenerator and realignsamfile parameter: -e

Specify the FASTA header of the target chromosome to extend (circularmapper only).

type: string

default: MT

The chromosome in your FASTA reference that you'd like to be treated as circular. By default this is set to MT but can be configured to match any other chromosome.

Modifies circulargenerator parameter: -s

Turn on to remove reads that did not map to the circularised genome (circularmapper only).

type: boolean

If you want to filter out reads that don't map to a circular chromosome (and also non-circular chromosome headers) from the resulting BAM file, turn this on. By default this option is turned off.

Modifies -f and -x parameters of CircularMapper's realignsamfile

Specify the bowtie2 alignment mode. Options: 'local', 'end-to-end'.

type: string

The type of read alignment to use. Options are 'local' or 'end-to-end'. Local allows only partial alignment of read, with ends of reads possibly 'soft-clipped' (i.e. remain unaligned/ignored), if the soft-clipped alignment provides best alignment score. End-to-end requires all nucleotides to be aligned. Default is 'local', following Cahill et al (2018) and Poullet and Orlando 2020.

Modifies Bowtie2 parameters: --very-fast --fast --sensitive --very-sensitive --very-fast-local --fast-local --sensitive-local --very-sensitive-local

Specify the level of sensitivity for the bowtie2 alignment mode. Options: 'no-preset', 'very-fast', 'fast', 'sensitive', 'very-sensitive'.

type: string

The Bowtie2 'preset' to use. Options: 'no-preset' 'very-fast', 'fast', 'sensitive', or 'very-sensitive'. These strings apply to both --bt2_alignmode options. See the Bowtie2 manual for actual settings. Default is 'sensitive' (following Poullet and Orlando (2020), when running damaged-data without UDG treatment)

Modifies Bowtie2 parameters: --very-fast --fast --sensitive --very-sensitive --very-fast-local --fast-local --sensitive-local --very-sensitive-local

Specify the -N parameter for bowtie2 (mismatches in seed). This will override defaults from alignmode/sensitivity.

type: integer

The number of mismatches allowed in the seed during seed-and-extend procedure of Bowtie2. This will override any values set with --bt2_sensitivity. Can either be 0 or 1. Default: 0 (i.e. use--bt2_sensitivity defaults).

Modifies Bowtie2 parameters: -N

Specify the -L parameter for bowtie2 (length of seed substrings). This will override defaults from alignmode/sensitivity.

type: integer

The length of the seed sub-string to use during seeding. This will override any values set with --bt2_sensitivity. Default: 0 (i.e. use--bt2_sensitivity defaults: 20 for local and 22 for end-to-end.

Modifies Bowtie2 parameters: -L

Specify number of bases to trim off from 5' (left) end of read before alignment.

type: integer

Number of bases to trim at the 5' (left) end of read prior alignment. Maybe useful when left-over sequencing artefacts of in-line barcodes present Default: 0

Modifies Bowtie2 parameters: -bt2_trim5

Specify number of bases to trim off from 3' (right) end of read before alignment.

type: integer

Number of bases to trim at the 3' (right) end of read prior alignment. Maybe useful when left-over sequencing artefacts of in-line barcodes present Default: 0.

Modifies Bowtie2 parameters: -bt2_trim3

Specify the maximum fragment length for Bowtie2 paired-end mapping mode only.

type: integer

default: 500

The maximum fragment for valid paired-end alignments. Only for paired-end mapping (i.e. unmerged), and therefore typically only useful for modern data.

See Bowtie2 documentation for more information.

Modifies Bowtie2 parameters: --maxins

Options for production of host-read removed FASTQ files for privacy reasons.

Turn on per-library creation pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)

type: boolean

Create pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)

Host removal mode. Remove mapped reads completely from FASTQ (remove) or just mask mapped reads sequence by N (replace).

type: string

Read removal mode. Remove mapped reads completely ('remove') or just replace mapped reads sequence by N ('replace')

Modifies extract_map_reads.py parameter: -m

Options for quality filtering and how to deal with off-target unmapped reads.

Turn on filtering of mapping quality, read lengths, or unmapped reads of BAM files.

type: boolean

Turns on the bam filtering module for either mapping quality filtering or unmapped read treatment.

Minimum mapping quality for reads filter.

type: integer

Specify a mapping quality threshold for mapped reads to be kept for downstream analysis. By default keeps all reads and is therefore set to 0 (basically doesn't filter anything).

Modifies samtools view parameter: -q

Specify minimum read length to be kept after mapping.

type: integer

Specify minimum length of mapped reads. This filtering will apply at the same time as mapping quality filtering.

If used instead of minimum length read filtering at AdapterRemoval, this can be useful to get more realistic endogenous DNA percentages, when most of your reads are very short (e.g. in single-stranded libraries) and would otherwise be discarded by AdapterRemoval (thus making an artificially small denominator for a typical endogenous DNA calculation). Note in this context you should not perform mapping quality filtering nor discarding of unmapped reads to ensure a correct denominator of all reads, for the endogenous DNA calculation.

Modifies filter_bam_fragment_length.py parameter: -l

Defines whether to discard all unmapped reads, keep only bam and/or keep only fastq format Options: 'discard', 'bam', 'fastq', 'both'.

type: string

Defines how to proceed with unmapped reads: 'discard' removes all unmapped reads, keep keeps both unmapped and mapped reads in the same BAM file, 'bam' keeps unmapped reads as BAM file, 'fastq' keeps unmapped reads as FastQ file, both keeps both BAM and FASTQ files. Default is discard. keep is what would happen if --run_bam_filtering was not supplied.

Note that in all cases, if --bam_mapping_quality_threshold is also supplied, mapping quality filtering will still occur on the mapped reads.

Modifies samtools view parameter: -f4 -F4

Options for removal of PCR amplicon duplicates that can artificially inflate coverage.

Deduplication method to use. Options: 'markduplicates', 'dedup'.

type: string

Sets the duplicate read removal tool. By default uses markduplicates from Picard. Alternatively an ancient DNA specific read deduplication tool dedup (Peltzer et al. 2016) is offered.

This utilises both ends of paired-end data to remove duplicates (i.e. true exact duplicates, as markduplicates will over-zealously deduplicate anything with the same starting position even if the ends are different). DeDup should generally only be used solely on paired-end data otherwise suboptimal deduplication can occur if applied to either single-end or a mix of single-end/paired-end data.

Turn on treating all reads as merged reads.

type: boolean

Sets DeDup to treat all reads as merged reads. This is useful if reads are for example not prefixed with M_ in all cases. Therefore, this can be used as a workaround when also using a mixture of paired-end and single-end data, however this is not recommended (see above).

Modifies dedup parameter: -m

Options for calculating library complexity (i.e. how many unique reads are present).

Specify which mode of preseq to run.

type: string

Specify which mode of preseq to run.

From the PreSeq documentation:

c curve is used to compute the expected complexity curve of a mapped read file with a hypergeometric
formula

lc extrap is used to generate the expected yield for theoretical larger experiments and bounds on the
number of distinct reads in the library and the associated confidence intervals, which is computed by
bootstrapping the observed duplicate counts histogram

Specify the step size of Preseq.

type: integer

default: 1000

Can be used to configure the step size of Preseq's c_curve and lc_extrap method. Can be useful when only few and thus shallow sequencing results are used for extrapolation.

Modifies preseq c_curve and lc_extrap parameter: -s

Specify the maximum extrapolation (lc_extrap mode only)

type: integer

default: 10000000000

Specify the maximum extrapolation that lc_extrap mode will perform.

Modifies preseq lc_extrap parameter: -e

Specify the maximum number of terms for extrapolation (lc_extrap mode only)

type: integer

default: 100

Specify the maximum number of terms that lc_extrap mode will use.

Modifies preseq lc_extrap parameter: -x

Specify number of bootstraps to perform (lc_extrap mode only)

type: integer

default: 100

Specify the number of bootstraps lc_extrap mode will perform to calculate confidence intervals.

Modifies preseq lc_extrap parameter: -n

Specify confidence interval level (lc_extrap mode only)

type: number

default: 0.95

Specify the allowed level of confidence intervals used for lc_extrap mode.

Modifies preseq lc_extrap parameter: -c

Options for calculating and filtering for characteristic ancient DNA damage patterns.

Specify the tool to use for damage calculation.

type: string

Specify the tool to be used for damage calculation. DamageProfiler is generally faster than mapDamage2, but the latter has an option to limit the number of reads used. This can significantly speed up the processing of very large files, where the damage estimates are already accurate after processing only a fraction of the input. Options: damageprofiler, mapdamage. By default, DamageProfiler is used.

Specify length filter for DamageProfiler.

type: integer

default: 100

Specifies the length filter for DamageProfiler. By default set to 100.

Modifies DamageProfile parameter: -l

Specify number of bases of each read to consider for DamageProfiler calculations.

type: integer

default: 15

Specifies the length of the read start and end to be considered for profile generation in DamageProfiler. By default set to 15 bases.

Modifies DamageProfile parameter: -t

Specify the maximum misincorporation frequency that should be displayed on the damage plot. Set to 0 to 'autoscale'.

type: number

default: 0.3

Specifies what the maximum misincorporation frequency should be displayed as, in the DamageProfiler damage plot. This is set to 0.30 (i.e. 30%) by default as this matches the popular mapDamage2.0 program. However, the default behaviour of DamageProfiler is to 'autoscale' the y-axis maximum to zoom in on any possible damage that may occur (e.g. if the damage is about 10%, the highest value on the y-axis would be set to 0.12). This 'autoscale' behaviour can be turned on by specifying the number to 0. Default: 0.30.

Modifies DamageProfile parameter: -yaxis_damageplot

Specify the maximum number of reads to consider for damage calculation. Defaults value is 0 (i.e. no downsampling is performed).

type: integer

The maximum number of reads used for damage calculation in mapDamage2. Can be used to significantly reduce the amount of time required for damage assessment. Note that a too low value can also obtain incorrect results.

Modifies mapDamage2 parameter: -n

Specify the maximum misincorporation frequency that should be displayed on the damage plot.

type: number

default: 0.3

Specifies what the maximum misincorporation frequency should be displayed as, in the mapDamage2 damage plot. This defaults to 0.30 (i.e. 30%).

Modifies mapDamage2 parameter: -y

Turn on PMDtools

type: boolean

Specifies to run PMDTools for damage based read filtering and assessment of DNA damage in sequencing libraries. By default turned off.

Specify range of bases for PMDTools to scan for damage.

type: integer

default: 10

Specifies the range in which to consider DNA damage from the ends of reads. By default set to 10.

Modifies PMDTools parameter: --range

Specify PMDScore threshold for PMDTools.

type: integer

default: 3

Specifies the PMDScore threshold to use in the pipeline when filtering BAM files for DNA damage. Only reads which surpass this damage score are considered for downstream DNA analysis. By default set to 3 if not set specifically by the user.

Modifies PMDTools parameter: --threshold

Specify a bedfile to be used to mask the reference fasta prior to running pmdtools.

type: string

Activates masking of the reference fasta prior to running pmdtools. Positions that are in the provided bedfile will be replaced by Ns in the reference genome. This is useful for capture data, where you might not want the allele of a SNP to be counted as damage when it is a transition. Masking of the reference is done using bedtools maskfasta.

Specify the maximum number of reads to consider for metrics generation.

type: integer

default: 10000

The maximum number of reads used for damage assessment in PMDtools. Can be used to significantly reduce the amount of time required for damage assessment in PMDTools. Note that a too low value can also obtain incorrect results.

Modifies PMDTools parameter: -n

Append big list of base frequencies for platypus to output.

type: boolean

Enables the printing of a wider list of base frequencies used by platypus as an addition to the output base misincorporation frequency table. By default turned off.

Turn on damage rescaling of BAM files using mapDamage2 to probabilistically remove damage.

type: boolean

Turns on mapDamage2's BAM rescaling functionality. This probablistically replaces Ts back to Cs depending on the likelihood this reference-mismatch was originally caused by damage. If the library is specified to be single stranded, this will automatically use the --single-stranded mode.

This functionality does not have any MultiQC output.

⚠️ rescaled libraries will not be merged with non-scaled libraries of the same sample for downstream genotyping, as the model may be different for each library. If you wish to merge these, please do this manually and re-run nf-core/eager using the merged BAMs as input.

Modifies the --rescale parameter of mapDamage2

Length of read sequence to use from each side for rescaling. Can be overridden by --rescale_length_*p.

type: integer

default: 12

Specify the length from the end of the read that mapDamage should rescale at both ends.

Modifies the --seq-length parameter of mapDamage2.

Length of read for mapDamage2 to rescale from 5p end. Only used if not 0 otherwise --rescale_seqlength used.

type: integer

Specify the length from the end of the read that mapDamage should rescale. Overrides --rescale_seqlength.

Modifies the --rescale-length-5p parameter of mapDamage2.

Length of read for mapDamage2 to rescale from 3p end. Only used if not 0 otherwise --rescale_seqlength used..

type: integer

Specify the length from the end of the read that mapDamage should rescale.

Modifies the --rescale-length-3p parameter of mapDamage2.

Options for getting reference annotation statistics (e.g. gene coverages)

Turn on ability to calculate no. reads, depth and breadth coverage of features in reference.

type: boolean

Specifies to turn on the bedtools module, producing statistics for breadth (or percent coverage), and depth (or X fold) coverages.

Path to GFF or BED file containing positions of features in reference file (--fasta). Path should be enclosed in quotes.

type: string

Specify the path to a GFF/BED containing the feature coordinates (or any acceptable input for bedtools coverage). Must be in quotes.

Specify if the annotation file provided to --anno_file is not sorted in the same way as the reference fasta file.

type: boolean

In cases where the annotation file is NOT sorted the same way as the reference fasta, this option should be specified. This will significantly increase the memory usage of bedtools!

Modifies bedtools parameter: -sorted

Options for trimming of aligned reads (e.g. to remove damage prior genotyping).

Turn on BAM trimming. Will only run on non-UDG or half-UDG libraries

type: boolean

Turns on the BAM trimming method. Trims off [n] bases from reads in the deduplicated BAM file. Damage assessment in PMDTools or DamageProfiler remains untouched, as data is routed through this independently. BAM trimming is typically performed to reduce errors during genotyping that can be caused by aDNA damage.

BAM trimming will only be performed on libraries indicated as --udg_type 'none' or --udg_type 'half'. Complete UDG treatment ('full') should have removed all damage. The amount of bases that will be trimmed off can be set separately for libraries with --udg_type 'none' and 'half' (see --bamutils_clip_half_udg_left / --bamutils_clip_half_udg_right / --bamutils_clip_none_udg_left / --bamutils_clip_none_udg_right).

Note: additional artefacts such as bar-codes or adapters that could potentially also be trimmed should be removed prior mapping.

Specify the number of bases to clip off reads from 'left' end of read for double-stranded half-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to half. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Specify the number of bases to clip off reads from 'right' end of read for double-stranded half-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to half. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Specify the number of bases to clip off reads from 'left' end of read for double-stranded non-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to none. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Specify the number of bases to clip off reads from 'right' end of read for double-stranded non-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from double_stranded libraries whose UDG treatment is set to none. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Specify the number of bases to clip off reads from 'left' end of read for single-stranded half-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to half. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Specify the number of bases to clip off reads from 'right' end of read for single-stranded half-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to half. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Specify the number of bases to clip off reads from 'left' end of read for single-stranded non-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to none. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Specify the number of bases to clip off reads from 'right' end of read for single-stranded non-UDG libraries.

type: integer

Default set to 0 and clips off no bases on the left or right side of reads from single-stranded libraries whose UDG treatment is set to none. Note that reverse reads will automatically be clipped off at the reverse side with this (automatically reverses left and right for the reverse read).

Modifies bamUtil's trimBam parameter: -L -R

Turn on using softclip instead of hard masking.

type: boolean

By default, nf-core/eager uses hard clipping and sets clipped bases to N with quality ! in the BAM output. Turn this on to use soft-clipping instead, masking reads at the read ends respectively using the CIGAR string.

Modifies bam trimBam parameter: -c

Options for variant calling.

Turn on genotyping of BAM files.

type: boolean

Turns on genotyping to run on all post-dedup and downstream BAMs. For example if --run_pmdtools and --trim_bam are both supplied, the genotyper will be run on all three BAM files i.e. post-deduplication, post-pmd and post-trimmed BAM files.

Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Options: 'ug', 'hc', 'freebayes', 'pileupcaller', 'angsd'.

type: string

Specifies which genotyper to use. Current options are: GATK (v3.5) UnifiedGenotyper or GATK Haplotype Caller (v4); and the FreeBayes Caller. Specify 'ug', 'hc', 'freebayes', 'pileupcaller' and 'angsd' respectively.

Note that while UnifiedGenotyper is more suitable for low-coverage ancient DNA (HaplotypeCaller does de novo assembly around each variant site), be aware GATK 3.5 it is officially deprecated by the Broad Institute.

Specify which input BAM to use for genotyping. Options: 'raw', 'trimmed', 'pmd' or 'rescaled'.

type: string

Indicates which BAM file to use for genotyping, depending on what BAM processing modules you have turned on. Options are: 'raw' for mapped only, filtered, or DeDup BAMs (with priority right to left); 'trimmed' (for base clipped BAMs); 'pmd' (for pmdtools output); 'rescaled' (for mapDamage2 rescaling output). Default is: 'raw'.

Specify GATK phred-scaled confidence threshold.

type: integer

default: 30

If selected, specify a GATK genotyper phred-scaled confidence threshold of a given SNP/INDEL call. Default: 30

Modifies GATK UnifiedGenotyper or HaplotypeCaller parameter: -stand_call_conf

Specify GATK organism ploidy.

type: integer

default: 2

If selected, specify a GATK genotyper ploidy value of your reference organism. E.g. if you want to allow heterozygous calls from >= diploid organisms. Default: 2

Modifies GATK UnifiedGenotyper or HaplotypeCaller parameter: --sample-ploidy

Maximum depth coverage allowed for genotyping before down-sampling is turned on.

type: integer

default: 250

Maximum depth coverage allowed for genotyping before down-sampling is turned on. Any position with a coverage higher than this value will be randomly down-sampled to 250 reads. Default: 250

Modifies GATK UnifiedGenotyper parameter: -dcov

Specify VCF file for SNP annotation of output VCF files. Optional. Gzip not accepted.

type: string

(Optional) Specify VCF file for output VCF SNP annotation e.g. if you want to annotate your VCF file with 'rs' SNP IDs. Check GATK documentation for more information. Gzip not accepted.

Specify GATK output mode. Options: 'EMIT_VARIANTS_ONLY', 'EMIT_ALL_CONFIDENT_SITES', 'EMIT_ALL_ACTIVE_SITES'.

type: string

If the GATK genotyper HaplotypeCaller is selected, what type of VCF to create, i.e. produce calls for every site or just confidence sites. Options: 'EMIT_VARIANTS_ONLY', 'EMIT_ALL_CONFIDENT_SITES', 'EMIT_ALL_ACTIVE_SITES'. Default: 'EMIT_VARIANTS_ONLY'

Modifies GATK HaplotypeCaller parameter: -output_mode

Specify HaplotypeCaller mode for emitting reference confidence calls . Options: 'NONE', 'BP_RESOLUTION', 'GVCF'.

type: string

If the GATK HaplotypeCaller is selected, mode for emitting reference confidence calls. Options: 'NONE', 'BP_RESOLUTION', 'GVCF'. Default: 'GVCF'

Modifies GATK HaplotypeCaller parameter: --emit-ref-confidence

Specify GATK output mode. Options: 'EMIT_VARIANTS_ONLY', 'EMIT_ALL_CONFIDENT_SITES', 'EMIT_ALL_SITES'.

type: string

If the GATK UnifiedGenotyper is selected, what type of VCF to create, i.e. produce calls for every site or just confidence sites. Options: 'EMIT_VARIANTS_ONLY', 'EMIT_ALL_CONFIDENT_SITES', 'EMIT_ALL_SITES'. Default: 'EMIT_VARIANTS_ONLY'

Modifies GATK UnifiedGenotyper parameter: --output_mode

Specify UnifiedGenotyper likelihood model. Options: 'SNP', 'INDEL', 'BOTH', 'GENERALPLOIDYSNP', 'GENERALPLOIDYINDEL'.

type: string

If the GATK UnifiedGenotyper is selected, which likelihood model to follow, i.e. whether to call use SNPs or INDELS etc. Options: 'SNP', 'INDEL', 'BOTH', 'GENERALPLOIDYSNP', 'GENERALPLOIDYINDEL'. Default: 'SNP'

Modifies GATK UnifiedGenotyper parameter: --genotype_likelihoods_model

Specify to keep the BAM output of re-alignment around variants from GATK UnifiedGenotyper.

type: boolean

If provided when running GATK's UnifiedGenotyper, this will put into the output folder the BAMs that have realigned reads (with GATK's (v3) IndelRealigner) around possible variants for improved genotyping.

These BAMs will be stored in the same folder as the corresponding VCF files.

Supply a default base quality if a read is missing a base quality score. Setting to -1 turns this off.

type: string

When running GATK's UnifiedGenotyper, specify a value to set base quality scores, if reads are missing this information. Might be useful if you have 'synthetically' generated reads (e.g. chopping up a reference genome). Default is set to -1 which is to not set any default quality (turned off). Default: -1

Modifies GATK UnifiedGenotyper parameter: --defaultBaseQualities

Specify minimum required supporting observations to consider a variant.

type: integer

default: 1

Specify minimum required supporting observations to consider a variant. Default: 1

Modifies freebayes parameter: -C

Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified in --freebayes_C.

type: integer

Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified C. Not set by default.

Modifies freebayes parameter: -g

Specify ploidy of sample in FreeBayes.

type: integer

default: 2

Specify ploidy of sample in FreeBayes. Default is diploid. Default: 2

Modifies freebayes parameter: -p

Specify path to SNP panel in bed format for pileupCaller.

type: string

Specify a SNP panel in the form of a bed file of sites at which to generate pileup for pileupCaller.

Specify path to SNP panel in EIGENSTRAT format for pileupCaller.

type: string

Specify a SNP panel in EIGENSTRAT format, pileupCaller will call these sites.

Specify calling method to use. Options: 'randomHaploid', 'randomDiploid', 'majorityCall'.

type: string

Specify calling method to use. Options: randomHaploid, randomDiploid, majorityCall. Default: 'randomHaploid'

Modifies pileupCaller parameter: --randomHaploid --randomDiploid --majorityCall

Specify the calling mode for transitions. Options: 'AllSites', 'TransitionsMissing', 'SkipTransitions'.

type: string

Specify if genotypes of transition SNPs should be called, set to missing, or excluded from the genotypes respectively. Options: 'AllSites', 'TransitionsMissing', 'SkipTransitions'. Default: 'AllSites'

Modifies pileupCaller parameter: --skipTransitions --transitionsMissing

The minimum mapping quality to be used for genotyping.

type: integer

default: 30

The minimum mapping quality to be used for genotyping. Affects the samtools pileup output that is used by pileupcaller. Affects -q parameter of samtools mpileup.

The minimum base quality to be used for genotyping.

type: integer

default: 30

The minimum base quality to be used for genotyping. Affects the samtools pileup output that is used by pileupcaller. Affects -Q parameter of samtools mpileup.

Specify which ANGSD genotyping likelihood model to use. Options: 'samtools', 'gatk', 'soapsnp', 'syk'.

type: string

Specify which genotype likelihood model to use. Options: 'samtools, 'gatk', 'soapsnp', 'syk'. Default: 'samtools'

Modifies ANGSD parameter: -GL

Specify which output type to output ANGSD genotyping likelihood results: Options: 'text', 'binary', 'binary_three', 'beagle'.

type: string

Specifies what type of genotyping likelihood file format will be output. Options: 'text', 'binary', 'binary_three', 'beagle_binary'. Default: 'text'.

The options refer to the following descriptions respectively:

text: textoutput of all 10 log genotype likelihoods.
binary: binary all 10 log genotype likelihood
binary_three: binary 3 times likelihood
beagle_binary: beagle likelihood file

See the ANGSD documentation for more information on which to select for your downstream applications.

Modifies ANGSD parameter: -doGlF

Turn on creation of FASTA from ANGSD genotyping likelihood.

type: boolean

Turns on the ANGSD creation of a FASTA file from the BAM file.

Specify which genotype type of 'base calling' to use for ANGSD FASTA generation. Options: 'random', 'common'.

type: string

The type of base calling to be performed when creating the ANGSD FASTA file. Options: 'random' or 'common'. Will output the most common non-N base at each given position, whereas 'random' will pick one at random. Default: 'random'.

Modifies ANGSD parameter: -doFasta -doCounts

Turn on bcftools stats generation for VCF based variant calling statistics

type: boolean

default: true

Runs bcftools stats against VCF files from GATK and FreeBayes genotypers.

It will automatically include the FASTA reference for INDEL-related statistics.

Options for creation of a per-sample FASTA sequence useful for downstream analysis (e.g. multi sequence alignment)

Turns on ability to create a consensus sequence FASTA file based on a UnifiedGenotyper VCF file and the original reference (only considers SNPs).

type: boolean

Turn on consensus sequence genome creation via VCF2Genome. Only accepts GATK UnifiedGenotyper VCF files with the --gatk_ug_out_mode 'EMIT_ALL_SITES' and --gatk_ug_genotype_model 'SNP flags. Typically useful for small genomes such as mitochondria.

Specify the name of the output FASTA file containing the consensus sequence.

type: string

The output FASTA file will be named <sample_name>_<vcf2genome_outfile>.fasta.

Specify the header name of the consensus sequence entry within the FASTA file.

type: string

The name of the FASTA entry you would like in your FASTA file.

Minimum depth coverage required for a call to be included (else N will be called).

type: integer

default: 5

Minimum depth coverage for a SNP to be made. Else, a SNP will be called as N. Default: 5

Modifies VCF2Genome parameter: -minc

Minimum genotyping quality of a call to be called. Else N will be called.

type: integer

default: 30

Minimum genotyping quality of a call to be made. Else N will be called. Default: 30

Modifies VCF2Genome parameter: -minq

Minimum fraction of reads supporting a call to be included. Else N will be called.

type: number

default: 0.8

In the case of two possible alleles, the frequency of the majority allele required for a call to be made. Else, a SNP will be called as N. Default: 0.8

Modifies VCF2Genome parameter: -minfreq

Options for creation of a SNP table useful for downstream analysis (e.g. estimation of cross-mapping of different species and multi-sequence alignment)

Turn on MultiVCFAnalyzer. Note: This currently only supports diploid GATK UnifiedGenotyper input.

type: boolean

Turns on MultiVCFAnalyzer. Will only work when in combination with UnifiedGenotyper genotyping module.

Turn on writing write allele frequencies in the SNP table.

type: boolean

Specify whether to tell MultiVCFAnalyzer to write within the SNP table the frequencies of the allele at that position e.g. A (70%).

Specify the minimum genotyping quality threshold for a SNP to be called.

type: integer

default: 30

The minimal genotyping quality for a SNP to be considered for processing by MultiVCFAnalyzer. The default threshold is 30.

Specify the minimum number of reads a position needs to be covered to be considered for base calling.

type: integer

default: 5

The minimal number of reads covering a base for a SNP at that position to be considered for processing by MultiVCFAnalyzer. The default depth is 5.

Specify the minimum allele frequency that a base requires to be considered a 'homozygous' call.

type: number

default: 0.9

The minimal frequency of a nucleotide for a 'homozygous' SNP to be called. In other words, e.g. 90% of the reads covering that position must have that SNP to be called. If the threshold is not reached, and the previous two parameters are matched, a reference call is made (displayed as . in the SNP table). If the above two parameters are not met, an 'N' is called. The default allele frequency is 0.9.

Specify the minimum allele frequency that a base requires to be considered a 'heterozygous' call.

type: number

default: 0.9

The minimum frequency of a nucleotide for a 'heterozygous' SNP to be called. If
this parameter is set to the same as --min_allele_freq_hom, then only
homozygous calls are made. If this value is less than the previous parameter,
then a SNP call will be made. If it is between this and the previous parameter,
it will be displayed as a IUPAC uncertainty call. Default is 0.9.

Specify paths to additional pre-made VCF files to be included in the SNP table generation. Use wildcard(s) for multiple files.

type: string

If you wish to add to the table previously created VCF files, specify here a path with wildcards (in quotes). These VCF files must be created the same way as your settings for GATK UnifiedGenotyping module above.

Specify path to the reference genome annotations in '.gff' format. Optional.

type: string

default: NA

If you wish to report in the SNP table annotation information for the regions
SNPs fall in, provide a file in GFF format (the path must be in quotes).

Specify path to the positions to be excluded in '.gff' format. Optional.

type: string

default: NA

If you wish to exclude SNP regions from consideration by MultiVCFAnalyzer (such as for problematic regions), provide a file in GFF format (the path must be in quotes).

Specify path to the output file from SNP effect analysis in '.txt' format. Optional.

type: string

default: NA

If you wish to include results from SNPEff effect analysis, supply the output
from SNPEff in txt format (the path must be in quotes).

Options for the calculation of ratio of reads to one chromosome/FASTA entry against all others.

Turn on mitochondrial to nuclear ratio calculation.

type: boolean

Turn on the module to estimate the ratio of mitochondrial to nuclear reads.

Specify the name of the reference FASTA entry corresponding to the mitochondrial genome (up to the first space).

type: string

default: MT

Specify the FASTA entry in the reference file specified as --fasta, which acts
as the mitochondrial 'chromosome' to base the ratio calculation on. The tool
only accepts the first section of the header before the first space. The default
chromosome name is based on hs37d5/GrCH37 human reference genome. Default: 'MT'

Options for the calculation of biological sex of human individuals.

Turn on sex determination for human reference genomes. This will run on single- and double-stranded variants of a library separately.

type: boolean

Specify to run the optional process of sex determination.

Specify path to SNP panel in bed format for error bar calculation. Optional (see documentation).

type: string

Specify an optional bedfile of the list of SNPs to be used for X-/Y-rate calculation. Running without this parameter will considerably increase runtime, and render the resulting error bars untrustworthy. Theoretically, any set of SNPs that are distant enough that two SNPs are unlikely to be covered by the same read can be used here. The programme was coded with the 1240K panel in mind. The path must be in quotes.

Options for the estimation of contamination of human DNA.

Turn on nuclear contamination estimation for human reference genomes.

type: boolean

Specify to run the optional processes for (human) nuclear DNA contamination estimation.

The name of the X chromosome in your bam/FASTA header. 'X' for hs37d5, 'chrX' for HG19.

type: string

default: X

The name of the human chromosome X in your bam. 'X' for hs37d5, 'chrX' for HG19. Defaults to 'X'.

Options for metagenomic screening of off-target reads.

Turn on removal of low-sequence complexity reads for metagenomic screening with bbduk

type: boolean

Turns on low-sequence complexity filtering of off-target reads using bbduk.

This is typically performed to reduce the number of uninformative reads or potential false-positive reads, typically for input for metagenomic screening. This thus reduces false positive species IDs and also run-time and resource requirements.

See --metagenomic_complexity_entropy for how complexity is calculated. Important There are no MultiQC output results for this module, you must check the number of reads removed with the _bbduk.stats output file.

Default: off

Specify the entropy threshold that under which a sequencing read will be complexity filtered out. This should be between 0-1.

type: number

default: 0.3

Specify a minimum entropy threshold that under which it will be removed from the FASTQ file that goes into metagenomic screening.

A mono-nucleotide read such as GGGGGG will have an entropy of 0, a completely random sequence has an entropy of almost 1.

See the bbduk documentation on entropy for more information.

Modifiesbbduk parameter entropy=

Turn on metagenomic screening module for reference-unmapped reads.

type: boolean

Turn on the metagenomic screening module.

Specify which classifier to use. Options: 'malt', 'kraken'.

type: string

Specify which taxonomic classifier to use. There are two options available:

kraken for Kraken2
malt for MALT

⚠️ Important It is very important to run nextflow clean -f on your
Nextflow run directory once completed. RMA6 files are VERY large and are
copied from a work/ directory into the results folder. You should clean the
work directory with the command to ensure non-redundancy and large HDD
footprints!

Specify path to classifier database directory. For Kraken2 this can also be a .tar.gz of the directory.

type: string

Specify the path to the directory containing your taxonomic classifier's database (malt or kraken).

For Kraken2, it can be either the path to the directory or the path to the .tar.gz compressed directory of the Kraken2 database.

Specify a minimum number of reads a taxon of sample total is required to have to be retained. Not compatible with --malt_min_support_mode 'percent'.

type: integer

default: 1

Specify the minimum number of reads a given taxon is required to have to be retained as a positive 'hit'.
For malt, this only applies when --malt_min_support_mode is set to 'reads'. Default: 1.

Modifies MALT or kraken_parse.py parameter: -sup and -c respectively

Percent identity value threshold for MALT.

type: integer

default: 85

Specify the minimum percent identity (or similarity) a sequence must have to the reference for it to be retained. Default is 85

Only used when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: -id

Specify which alignment mode to use for MALT. Options: 'Unknown', 'BlastN', 'BlastP', 'BlastX', 'Classifier'.

type: string

Use this to run the program in 'BlastN', 'BlastP', 'BlastX' modes to align DNA
and DNA, protein and protein, or DNA reads against protein references
respectively. Ensure your database matches the mode. Check the
MALT
manual
for more details. Default: 'BlastN'

Only when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: -m

Specify alignment method for MALT. Options: 'Local', 'SemiGlobal'.

type: string

Specify what alignment algorithm to use. Options are 'Local' or 'SemiGlobal'. Local is a BLAST like alignment, but is much slower. Semi-global alignment aligns reads end-to-end. Default: 'SemiGlobal'

Only when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: -at

Specify the percent for LCA algorithm for MALT (see MEGAN6 CE manual).

type: integer

default: 1

Specify the top percent value of the LCA algorithm. From the MALT manual: "For each
read, only those matches are used for taxonomic placement whose bit disjointScore is within
10% of the best disjointScore for that read.". Default: 1.

Only when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: -top

Specify whether to use percent or raw number of reads for minimum support required for taxon to be retained for MALT. Options: 'percent', 'reads'.

type: string

Specify whether to use a percentage, or raw number of reads as the value used to decide the minimum support a taxon requires to be retained.

Only when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: -sup -supp

Specify the minimum percentage of reads a taxon of sample total is required to have to be retained for MALT.

type: number

default: 0.01

Specify the minimum number of reads (as a percentage of all assigned reads) a given taxon is required to have to be retained as a positive 'hit' in the RMA6 file. This only applies when --malt_min_support_mode is set to 'percent'. Default 0.01.

Only when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: -supp

Specify the maximum number of queries a read can have for MALT.

type: integer

default: 100

Specify the maximum number of alignments a read can have. All further alignments are discarded. Default: 100

Only when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: -mq

Specify the memory load method. Do not use 'map' with GPFS file systems for MALT as can be very slow. Options: 'load', 'page', 'map'.

type: string

How to load the database into memory. Options are 'load', 'page' or 'map'.
'load' directly loads the entire database into memory prior seed look up, this
is slow but compatible with all servers/file systems. 'page' and 'map'
perform a sort of 'chunked' database loading, allowing seed look up prior entire
database loading. Note that Page and Map modes do not work properly not with
many remote file-systems such as GPFS. Default is 'load'.

Only when --metagenomic_tool malt is also supplied.

Modifies MALT parameter: --memoryMode

Specify to also produce SAM alignment files. Note this includes both aligned and unaligned reads, and are gzipped. Note this will result in very large file sizes.

type: boolean

Specify to also produce gzipped SAM files of all alignments and un-aligned reads in addition to RMA6 files. These are not soft-clipped or in 'sparse' format. Can be useful for downstream analyses due to more common file format.

⚠️ can result in very large run output directories as this is essentially duplication of the RMA6 files.

Modifies MALT parameter -a -f

Options for authentication of metagenomic screening performed by MALT.

Turn on MaltExtract for MALT aDNA characteristics authentication.

type: boolean

Turn on MaltExtract for MALT aDNA characteristics authentication of metagenomic output from MALT.

More can be seen in the MaltExtract documentation

Only when --metagenomic_tool malt is also supplied

Path to a text file with taxa of interest (one taxon per row, NCBI taxonomy name format)

type: string

Path to a .txt file with taxa of interest you wish to assess for aDNA characteristics. In .txt file should be one taxon per row, and the taxon should be in a valid NCBI taxonomy name format.

Only when --metagenomic_tool malt is also supplied.

Path to directory containing containing NCBI resource files (ncbi.tre and ncbi.map; available: https://github.com/rhuebler/HOPS/)

type: string

Path to directory containing containing the NCBI resource tree and taxonomy table files (ncbi.tre and ncbi.map; available at the HOPS repository).

Only when --metagenomic_tool malt is also supplied.

Specify which MaltExtract filter to use. Options: 'def_anc', 'ancient', 'default', 'crawl', 'scan', 'srna', 'assignment'.

type: string

Specify which MaltExtract filter to use. This is used to specify what types of characteristics to scan for. The default will output statistics on all alignments, and then a second set with just reads with one C to T mismatch in the first 5 bases. Further details on other parameters can be seen in the HOPS documentation. Options: 'def_anc', 'ancient', 'default', 'crawl', 'scan', 'srna', 'assignment'. Default: 'def_anc'.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: -f

Specify percent of top alignments to use.

type: number

default: 0.01

Specify frequency of top alignments for each read to be considered for each node.
Default is 0.01, i.e. 1% of all reads (where 1 would correspond to 100%).

⚠️ this parameter follows the same concept as --malt_top_percent but
uses a different notation i.e. integer (MALT) versus float (MALTExtract)

Default: 0.01.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: -a

Turn off destacking.

type: boolean

Turn off destacking. If left on, a read that overlaps with another read will be
removed (leaving a depth coverage of 1).

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: --destackingOff

Turn off downsampling.

type: boolean

Turn off downsampling. By default, downsampling is on and will randomly select 10,000 reads if the number of reads on a node exceeds this number. This is to speed up processing, under the assumption at 10,000 reads the species is a 'true positive'.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: --downSampOff

Turn off duplicate removal.

type: boolean

Turn off duplicate removal. By default, reads that are an exact copy (i.e. same start, stop coordinate and exact sequence match) will be removed as it is considered a PCR duplicate.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: --dupRemOff

Turn on exporting alignments of hits in BLAST format.

type: boolean

Export alignments of hits for each node in BLAST format. By default turned off.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: --matches

Turn on export of MEGAN summary files.

type: boolean

Export 'minimal' summary files (i.e. without alignments) that can be loaded into MEGAN6. By default turned off.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: --meganSummary

Minimum percent identity alignments are required to have to be reported. Recommended to set same as MALT parameter.

type: number

default: 85

Minimum percent identity alignments are required to have to be reported. Higher values allows fewer mismatches between read and reference sequence, but therefore will provide greater confidence in the hit. Lower values allow more mismatches, which can account for damage and divergence of a related strain/species to the reference. Recommended to set same as MALT parameter or higher. Default: 85.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: --minPI

Turn on using top alignments per read after filtering.

type: boolean

Use the best alignment of each read for every statistic, except for those concerning read distribution and coverage. Default: off.

Only when --metagenomic_tool malt is also supplied.

Modifies MaltExtract parameter: --useTopAlignment