nf-core/eager
A fully reproducible and state-of-the-art ancient DNA analysis pipeline
2.2.1
). The latest
stable release is
2.5.3
.
22.10.6
.
Learn more.
Define where the pipeline should find input data, and additional metadata.
Either paths or URLs to FASTQ/BAM data (must be surrounded with quotes). For paired end data, the path must use ‘{1,2}’ notation to specify read pairs. Alternatively, a path to a TSV file (ending .tsv) containing file paths and sequencing/sample metadata. Allows for merging of multiple lanes/libraries/samples. Please see documentation for template.
string
null
Specifies whether you have UDG treated libraries. Set to ‘half’ for partial treatment, or ‘full’ for UDG. If not set, libraries are assumed to have no UDG treatment (‘none’). Not required for TSV input.
string
Specifies that libraries are single stranded. Always affects MALTExtract but will be ignored by pileupCaller with TSV input. Not required for TSV input.
boolean
Specifies that the input is single end reads. Not required for TSV input.
boolean
Specifies which Illumina sequencing chemistry was used. Used to inform whether to poly-G trim if turned on (see below). Not required for TSV input. Options: 2, 4.
integer
4
Specifies that the input is in BAM format. Not required for TSV input.
boolean
Additional options regarding input data.
If library result of SNP capture, path to BED file containing SNPS positions on reference genome.
string
Turns on conversion of an input BAM file into FASTQ format to allow re-preprocessing (e.g. AdapterRemoval etc.).
boolean
Specify locations of references and optionally, additional pre-made indices
Path or URL to a FASTA reference file (required if not iGenome reference). File suffixes can be: ‘.fa’, ‘.fn’, ‘.fna’, ‘.fasta’.
string
Name of iGenomes reference (required if not FASTA reference).
string
Directory / URL base for iGenomes references.
string
s3://ngi-igenomes/igenomes/
Do not load the iGenomes reference config.
boolean
Path to directory containing pre-made BWA indices (i.e. everything before the endings ‘.amb’ ‘.ann’ ‘.bwt’. Most likely the same path as —fasta). If not supplied will be made for you.
string
Path to directory containing pre-made Bowtie2 indices (i.e. everything before the endings e.g. ‘.1.bt2’, ‘.2.bt2’, ‘.rev.1.bt2’. Most likely the same value as —fasta). If not supplied will be made for you.
string
Path to samtools FASTA index (typically ending in ‘.fai’). If not supplied will be made for you.
string
Path to picard sequence dictionary file (typically ending in ‘.dict’). If not supplied will be made for you.
string
Specify to generate more recent ‘.csi’ BAM indices. If your reference genome is larger than 3.5GB, this is recommended due to more efficient data handling with the ‘.csi’ format over the older ‘.bai’.
boolean
If not already supplied by user, turns on saving of generated reference genome indices for later re-usage.
boolean
Specify where to put output files and optional saving of intermediate files
The output directory where the results will be saved.
string
./results
Mode for publishing results in the output directory. Options: ‘symlink’, ‘rellink’, ‘link’, ‘copy’, ‘copyNoFollow’, ‘move’.
string
copy
Turn this on if you want to keep trimmed reads.
boolean
true
Turn this on if you want to keep intermediate alignment files (SAM, BAM, non-dedupped BAM)
boolean
Less common options for the pipeline, typically set in a config file.
Display help text.
boolean
Workflow name of run, for future reference.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
Do not use coloured log outputs.
boolean
Custom config file to supply to MultiQC.
string
Directory to keep pipeline Nextflow logs and reports.
string
${params.outdir}/pipeline_info
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer
16
Maximum amount of memory that can be requested for any single job.
string
128.GB
Maximum amount of time that can be requested for any single job.
string
240.h
Parameters used to describe centralised config profiles. These generally should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional configs hostname.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
The AWSBatch JobQueue that needs to be set when running on AWSBatch
string
The AWS Region for your AWS Batch job to run on
string
eu-west-1
Path to the AWS CLI tool
string
Skip any of the mentioned steps.
boolean
boolean
boolean
boolean
boolean
boolean
Processing of Illumina two-colour chemistry data.
Turn on running poly-G removal on FASTQ files. Will only be performed on 2 colour chemistry machine sequenced libraries.
boolean
Specify length of poly-g min for clipping to be performed.
integer
10
Options for adapter clipping and paired-end merging.
Specify adapter sequence to be clipped off (forward strand).
string
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
Specify adapter sequence to be clipped off (reverse strand).
string
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
Specify read minimum length to be kept for downstream analysis.
integer
30
Specify minimum base quality for trimming off bases.
integer
20
Specify minimum adapter overlap required for clipping.
integer
1
Skip of merging forward and reverse reads together. Only applicable for paired-end libraries.
boolean
Skip adapter and quality trimming.
boolean
Skip quality base trimming (n, score, window) of 5 prime end.
boolean
Only use merged reads downstream (un-merged reads and singletons are discarded).
boolean
Options for reference-genome mapping
Specify which mapper to use. Options: ‘bwaaln’, ‘bwamem’, ‘circularmapper’, ‘bowtie2’.
string
Specify the -n parameter for BWA aln, i.e. amount of allowed mismatches in the alignment.
number
0.04
Specify the -k parameter for BWA aln, i.e. maximum edit distance allowed in a seed.
integer
2
Specify the -l parameter for BWA aln i.e. the length of seeds to be used.
integer
1024
Specify the number of bases to extend reference by (circularmapper only).
integer
500
Specify the FASTA header of the target chromosome to extend (circularmapper only).
string
MT
Turn on to filter off-target reads (circularmapper only).
boolean
Specify the bowtie2 alignment mode. Options: ‘local’, ‘end-to-end’.
string
Specify the level of sensitivity for the bowtie2 alignment mode. Options: ‘no-preset’, ‘very-fast’, ‘fast’, ‘sensitive’, ‘very-sensitive’.
string
Specify the -N parameter for bowtie2 (mismatches in seed). This will override defaults from alignmode/sensitivity.
integer
0
Specify the -L parameter for bowtie2 (length of seed substrings). This will override defaults from alignmode/sensitivity.
integer
0
Specify number of bases to trim off from 5’ (left) end of read before alignment.
integer
0
Specify number of bases to trim off from 3’ (right) end of read before alignment.
integer
0
Options for production of host-read removed FASTQ files for privacy reasons.
Turn on per-library creation pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)
boolean
Host removal mode. Remove mapped reads completely from FASTQ (remove) or just mask mapped reads sequence by N (replace).
string
Options for quality filtering and how to deal with off-target unmapped reads.
Turn on filtering of mapping quality, read lengths, or unmapped reads of BAM files.
boolean
Minimum mapping quality for reads filter.
integer
0
Specify minimum read length to be kept after mapping.
integer
0
Defines whether to discard all unmapped reads, keep only bam and/or keep only fastq format Options: ‘discard’, ‘bam’, ‘fastq’, ‘both’.
string
Options for removal of PCR amplicon duplicates that can artificially inflate coverage.
Deduplication method to use. Options: ‘markduplicates’, ‘dedup’.
string
Turn on treating all reads as merged reads.
boolean
Options for calculating library complexity (i.e. how many unique reads are present).
Specify the step size of Preseq.
integer
1000
Options for calculating and filtering for characteristic ancient DNA damage patterns.
Specify length filter for DamageProfiler.
integer
100
Specify number of bases of each read to consider for DamageProfiler calculations.
integer
15
Specify the maximum misincorporation frequency that should be displayed on damage plot. Set to 0 to ‘autoscale’.
number
0.3
Turn on PMDtools
boolean
Specify range of bases for PMDTools to scan for damage.
integer
10
Specify PMDScore threshold for PMDTools.
integer
3
Specify a path to reference mask for PMDTools.
string
Specify the maximum number of reads to consider for metrics generation.
integer
10000
Options for getting reference annotation statistics (e.g. gene coverages)
Turn on ability to calculate no. reads, depth and breadth coverage of features in reference.
boolean
Path to GFF or BED file containing positions of features in reference file (—fasta). Path should be enclosed in quotes.
string
Options for trimming of aligned reads (e.g. to remove damage prior genotyping).
Turn on BAM trimming. Will only run on non-UDG or half-UDG libraries
boolean
Specify the number of bases to clip off reads from ‘left’ end of read for half-UDG libraries.
integer
1
Specify the number of bases to clip off reads from ‘right’ end of read for half-UDG libraries.
integer
1
Specify the number of bases to clip off reads from ‘left’ end of read for non-UDG libraries.
integer
1
Specify the number of bases to clip off reads from ‘right’ end of read for non-UDG libraries.
integer
1
Turn on using softclip instead of hard masking.
boolean
Options for variant calling.
Turn on genotyping of BAM files.
boolean
Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Note: UnifiedGenotyper requires user-supplied defined GATK 3.5 jar file. Options: ‘ug’, ‘hc’, ‘freebayes’, ‘pileupcaller’, ‘angsd’.
string
Specify which input BAM to use for genotyping. Options: ‘raw’, ‘trimmed’ or ‘pmd’.
string
raw
When specifying to use GATK UnifiedGenotyper, path to GATK 3.5 .jar.
string
Specify GATK phred-scaled confidence threshold.
integer
30
Specify GATK organism ploidy.
integer
2
Maximum depth coverage allowed for genotyping before down-sampling is turned on.
integer
250
Specify VCF file for SNP annotation of output VCF files. Optional. Gzip not accepted.
string
Specify GATK output mode. Options: ‘EMIT_VARIANTS_ONLY’, ‘EMIT_ALL_CONFIDENT_SITES’, ‘EMIT_ALL_ACTIVE_SITES’.
string
Specify HaplotypeCaller mode for emitting reference confidence calls . Options: ‘NONE’, ‘BP_RESOLUTION’, ‘GVCF’.
string
Specify GATK output mode. Options: ‘EMIT_VARIANTS_ONLY’, ‘EMIT_ALL_CONFIDENT_SITES’, ‘EMIT_ALL_SITES’.
string
Specify UnifiedGenotyper likelihood model. Options: ‘SNP’, ‘INDEL’, ‘BOTH’, ‘GENERALPLOIDYSNP’, ‘GENERALPLOIDYINDEL’.
string
Specify to keep the BAM output of re-alignment around variants from GATK UnifiedGenotyper.
string
Supply a default base quality if a read is missing a base quality score. Setting to -1 turns this off.
string
Specify minimum required supporting observations to consider a variant.
integer
1
Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified in —freebayes_C.
integer
0
Specify ploidy of sample in FreeBayes.
integer
2
Specify path to SNP panel in bed format for pileupCaller.
string
Specify path to SNP panel in EIGENSTRAT format for pileupCaller.
string
Specify calling method to use. Options: ‘randomHaploid’, ‘randomDiploid’, ‘majorityCall’.
string
Specify the calling mode for transitions. Options: ‘AllSites’, ‘TransitionsMissing’, ‘SkipTransitions’.
string
Specify which ANGSD genotyping likelihood model to use. Options: ‘samtools’, ‘gatk’, ‘soapsnp’, ‘syk’.
string
Specify which output type to output ANGSD genotyping likelihood results: Options: ‘text’, ‘binary’, ‘binary_three’, ‘beagle’.
string
Turn on creation of FASTA from ANGSD genotyping likelihood.
boolean
Specify which genotype type of ‘base calling’ to use for ANGSD FASTA generation. Options: ‘random’, ‘common’.
string
Options for creation of a per-sample FASTA sequence useful for downstream analysis (e.g. multi sequence alignment)
Turns on ability to create a consensus sequence FASTA file based on a UnifiedGenotyper VCF file and the original reference (only considers SNPs).
boolean
Specify name of the output FASTA file containing the consensus sequence. Do not include .vcf
in the file name.
string
Specify the header name of the consensus sequence entry within the FASTA file.
string
Minimum depth coverage required for a call to be included (else N will be called).
integer
5
Minimum genotyping quality of a call to be called. Else N will be called.
integer
30
Minimum fraction of reads supporting a call to be included. Else N will be called.
number
0.8
Options for creation of a SNP table useful for downstream analysis (e.g. estimation of cross-mapping of different species and multi-sequence alignment)
Turn on MultiVCFAnalyzer. Note: This currently only supports diploid GATK UnifiedGenotyper input.
boolean
Turn on writing write allele frequencies in the SNP table.
boolean
Specify the minimum genotyping quality threshold for a SNP to be called.
integer
30
Specify the minimum number of reads a position needs to be covered to be considered for base calling.
integer
5
Specify the minimum allele frequency that a base requires to be considered a ‘homozygous’ call.
number
0.9
Specify the minimum allele frequency that a base requires to be considered a ‘heterozygous’ call.
number
0.9
Specify paths to additional pre-made VCF files to be included in the SNP table generation. Use wildcard(s) for multiple files.
string
Specify path to the reference genome annotations in ‘.gff’ format. Optional.
string
NA
Specify path to the positions to be excluded in ‘.gff’ format. Optional.
string
NA
Specify path to the output file from SNP effect analysis in ‘.txt’ format. Optional.
string
NA
Options for the calculation of ratio of reads to one chromosome/FASTA entry against all others.
Turn on mitochondrial to nuclear ratio calculation.
boolean
Specify the name of the reference FASTA entry corresponding to the mitochondrial genome (up to the first space).
string
MT
Options for the calculation of biological sex of human individuals.
Turn on sex determination for human reference genomes.
boolean
Specify path to SNP panel in bed format for error bar calculation. Optional (see documentation).
string
Options for the estimation of contamination of human DNA.
Turn on nuclear contamination estimation for human reference genomes.
boolean
The name of the X chromosome in your bam/FASTA header. ‘X’ for hs37d5, ‘chrX’ for HG19.
string
X
Options for metagenomic screening of off-target reads.
Turn on metagenomic screening module for reference-unmapped reads.
boolean
Specify which classifier to use. Options: ‘malt’, ‘kraken’.
string
undefined
Specify path to classifier database directory. For Kraken2 this can also be a .tar.gz
of the directory.
string
Specify a minimum number of reads a taxon of sample total is required to have to be retained. Not compatible with —malt_min_support_mode ‘percent’.
integer
1
Percent identity value threshold for MALT.
integer
85
Specify which alignment mode to use for MALT. Options: ‘Unknown’, ‘BlastN’, ‘BlastP’, ‘BlastX’, ‘Classifier’.
string
Specify alignment method for MALT. Options: ‘Local’, ‘SemiGlobal’.
string
Specify the percent for LCA algorithm for MALT (see MEGAN6 CE manual).
integer
1
Specify whether to use percent or raw number of reads for minimum support required for taxon to be retained for MALT. Options: ‘percent’, ‘reads’.
string
Specify the minimum percentage of reads a taxon of sample total is required to have to be retained for MALT.
number
0.01
Specify the maximum number of queries a read can have for MALT.
integer
100
Specify the memory load method. Do not use ‘map’ with GPFS file systems for MALT as can be very slow. Options: ‘load’, ‘page’, ‘map’.
string
Specify to also produce SAM alignment files. Note this includes both aligned and unaligned reads, and are gzipped. Note this will result in very large file sizes.
boolean
Options for authentication of metagenomic screening performed by MALT.
Turn on MaltExtract for MALT aDNA characteristics authentication.
boolean
Path to a text file with taxa of interest (one taxon per row, NCBI taxonomy name format)
string
Path to directory containing containing NCBI resource files (ncbi.tre and ncbi.map; available: https://github.com/rhuebler/HOPS/)
string
Specify which MaltExtract filter to use. Options: ‘def_anc’, ‘ancient’, ‘default’, ‘crawl’, ‘scan’, ‘srna’, ‘assignment’.
string
Specify percent of top alignments to use.
number
0.01
Turn off destacking.
boolean
Turn off downsampling.
boolean
Turn off duplicate removal.
boolean
Turn on exporting alignments of hits in BLAST format.
boolean
Turn on export of MEGAN summary files.
boolean
Minimum percent identity alignments are required to have to be reported. Recommended to set same as MALT parameter.
number
85
Turn on using top alignments per read after filtering.
boolean