eager: Parameters

Define where the pipeline should find input data, and additional metadata.

Either paths or URLs to FASTQ/BAM data (must be surrounded with quotes). For paired end data, the path must use ‘{1,2}’ notation to specify read pairs. Alternatively, a path to a TSV file (ending .tsv) containing file paths and sequencing/sample metadata. Allows for merging of multiple lanes/libraries/samples. Please see documentation for template.

required

type: string

default: null

Specifies whether you have UDG treated libraries. Set to ‘half’ for partial treatment, or ‘full’ for UDG. If not set, libraries are assumed to have no UDG treatment (‘none’). Not required for TSV input.

type: string

Specifies that libraries are single stranded. Always affects MALTExtract but will be ignored by pileupCaller with TSV input. Not required for TSV input.

type: boolean

Specifies that the input is single end reads. Not required for TSV input.

type: boolean

Specifies which Illumina sequencing chemistry was used. Used to inform whether to poly-G trim if turned on (see below). Not required for TSV input. Options: 2, 4.

type: integer

default: 4

Specifies that the input is in BAM format. Not required for TSV input.

type: boolean

Additional options regarding input data.

If library result of SNP capture, path to BED file containing SNPS positions on reference genome.

type: string

Turns on conversion of an input BAM file into FASTQ format to allow re-preprocessing (e.g. AdapterRemoval etc.).

type: boolean

Specify locations of references and optionally, additional pre-made indices

Path or URL to a FASTA reference file (required if not iGenome reference). File suffixes can be: ‘.fa’, ‘.fn’, ‘.fna’, ‘.fasta’.

type: string

Name of iGenomes reference (required if not FASTA reference).

type: string

Directory / URL base for iGenomes references.

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Do not load the iGenomes reference config.

hidden

type: boolean

Path to directory containing pre-made BWA indices (i.e. everything before the endings ‘.amb’ ‘.ann’ ‘.bwt’. Most likely the same path as —fasta). If not supplied will be made for you.

type: string

Path to directory containing pre-made Bowtie2 indices (i.e. everything before the endings e.g. ‘.1.bt2’, ‘.2.bt2’, ‘.rev.1.bt2’. Most likely the same value as —fasta). If not supplied will be made for you.

type: string

Path to samtools FASTA index (typically ending in ‘.fai’). If not supplied will be made for you.

type: string

Path to picard sequence dictionary file (typically ending in ‘.dict’). If not supplied will be made for you.

type: string

Specify to generate more recent ‘.csi’ BAM indices. If your reference genome is larger than 3.5GB, this is recommended due to more efficient data handling with the ‘.csi’ format over the older ‘.bai’.

type: boolean

If not already supplied by user, turns on saving of generated reference genome indices for later re-usage.

type: boolean

Specify where to put output files and optional saving of intermediate files

The output directory where the results will be saved.

type: string

default: ./results

Mode for publishing results in the output directory. Options: ‘symlink’, ‘rellink’, ‘link’, ‘copy’, ‘copyNoFollow’, ‘move’.

type: string

default: copy

Turn this on if you want to keep trimmed reads.

hidden

type: boolean

default: true

Turn this on if you want to keep intermediate alignment files (SAM, BAM, non-dedupped BAM)

hidden

type: boolean

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Workflow name of run, for future reference.

hidden

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

Do not use coloured log outputs.

hidden

type: boolean

Custom config file to supply to MultiQC.

hidden

type: string

Directory to keep pipeline Nextflow logs and reports.

hidden

type: string

default: ${params.outdir}/pipeline_info

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

Parameters used to describe centralised config profiles. These generally should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional configs hostname.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

The AWSBatch JobQueue that needs to be set when running on AWSBatch

type: string

The AWS Region for your AWS Batch job to run on

type: string

default: eu-west-1

Path to the AWS CLI tool

type: string

Skip any of the mentioned steps.

type: boolean

Processing of Illumina two-colour chemistry data.

Turn on running poly-G removal on FASTQ files. Will only be performed on 2 colour chemistry machine sequenced libraries.

type: boolean

Specify length of poly-g min for clipping to be performed.

type: integer

default: 10

Options for adapter clipping and paired-end merging.

Specify adapter sequence to be clipped off (forward strand).

type: string

default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

Specify adapter sequence to be clipped off (reverse strand).

type: string

default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

Specify read minimum length to be kept for downstream analysis.

type: integer

default: 30

Specify minimum base quality for trimming off bases.

type: integer

default: 20

Specify minimum adapter overlap required for clipping.

type: integer

default: 1

Skip of merging forward and reverse reads together. Only applicable for paired-end libraries.

type: boolean

Skip adapter and quality trimming.

type: boolean

Skip quality base trimming (n, score, window) of 5 prime end.

type: boolean

Only use merged reads downstream (un-merged reads and singletons are discarded).

type: boolean

Options for reference-genome mapping

Specify which mapper to use. Options: ‘bwaaln’, ‘bwamem’, ‘circularmapper’, ‘bowtie2’.

type: string

Specify the -n parameter for BWA aln, i.e. amount of allowed mismatches in the alignment.

type: number

default: 0.04

Specify the -k parameter for BWA aln, i.e. maximum edit distance allowed in a seed.

type: integer

default: 2

Specify the -l parameter for BWA aln i.e. the length of seeds to be used.

type: integer

default: 1024

Specify the number of bases to extend reference by (circularmapper only).

type: integer

default: 500

Specify the FASTA header of the target chromosome to extend (circularmapper only).

type: string

default: MT

Turn on to filter off-target reads (circularmapper only).

type: boolean

Specify the bowtie2 alignment mode. Options: ‘local’, ‘end-to-end’.

type: string

Specify the level of sensitivity for the bowtie2 alignment mode. Options: ‘no-preset’, ‘very-fast’, ‘fast’, ‘sensitive’, ‘very-sensitive’.

type: string

Specify the -N parameter for bowtie2 (mismatches in seed). This will override defaults from alignmode/sensitivity.

type: integer

default: 0

Specify the -L parameter for bowtie2 (length of seed substrings). This will override defaults from alignmode/sensitivity.

type: integer

default: 0

Specify number of bases to trim off from 5’ (left) end of read before alignment.

type: integer

default: 0

Specify number of bases to trim off from 3’ (right) end of read before alignment.

type: integer

default: 0

Options for production of host-read removed FASTQ files for privacy reasons.

Turn on per-library creation pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)

type: boolean

Host removal mode. Remove mapped reads completely from FASTQ (remove) or just mask mapped reads sequence by N (replace).

type: string

Options for quality filtering and how to deal with off-target unmapped reads.

Turn on filtering of mapping quality, read lengths, or unmapped reads of BAM files.

type: boolean

Minimum mapping quality for reads filter.

type: integer

default: 0

Specify minimum read length to be kept after mapping.

type: integer

default: 0

Defines whether to discard all unmapped reads, keep only bam and/or keep only fastq format Options: ‘discard’, ‘bam’, ‘fastq’, ‘both’.

type: string

Options for removal of PCR amplicon duplicates that can artificially inflate coverage.

Deduplication method to use. Options: ‘markduplicates’, ‘dedup’.

type: string

Turn on treating all reads as merged reads.

type: boolean

Options for calculating library complexity (i.e. how many unique reads are present).

Specify the step size of Preseq.

type: integer

default: 1000

Options for calculating and filtering for characteristic ancient DNA damage patterns.

Specify length filter for DamageProfiler.

type: integer

default: 100

Specify number of bases of each read to consider for DamageProfiler calculations.

type: integer

default: 15

Specify the maximum misincorporation frequency that should be displayed on damage plot. Set to 0 to ‘autoscale’.

type: number

default: 0.3

Turn on PMDtools

type: boolean

Specify range of bases for PMDTools to scan for damage.

type: integer

default: 10

Specify PMDScore threshold for PMDTools.

type: integer

default: 3

Specify a path to reference mask for PMDTools.

type: string

Specify the maximum number of reads to consider for metrics generation.

type: integer

default: 10000

Options for getting reference annotation statistics (e.g. gene coverages)

Turn on ability to calculate no. reads, depth and breadth coverage of features in reference.

type: boolean

Path to GFF or BED file containing positions of features in reference file (—fasta). Path should be enclosed in quotes.

type: string

Options for trimming of aligned reads (e.g. to remove damage prior genotyping).

Turn on BAM trimming. Will only run on non-UDG or half-UDG libraries

type: boolean

Specify the number of bases to clip off reads from ‘left’ end of read for half-UDG libraries.

type: integer

default: 1

Specify the number of bases to clip off reads from ‘right’ end of read for half-UDG libraries.

type: integer

default: 1

Specify the number of bases to clip off reads from ‘left’ end of read for non-UDG libraries.

type: integer

default: 1

Specify the number of bases to clip off reads from ‘right’ end of read for non-UDG libraries.

type: integer

default: 1

Turn on using softclip instead of hard masking.

type: boolean

Options for variant calling.

Turn on genotyping of BAM files.

type: boolean

Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Note: UnifiedGenotyper requires user-supplied defined GATK 3.5 jar file. Options: ‘ug’, ‘hc’, ‘freebayes’, ‘pileupcaller’, ‘angsd’.

type: string

Specify which input BAM to use for genotyping. Options: ‘raw’, ‘trimmed’ or ‘pmd’.

type: string

default: raw

When specifying to use GATK UnifiedGenotyper, path to GATK 3.5 .jar.

type: string

Specify GATK phred-scaled confidence threshold.

type: integer

default: 30

Specify GATK organism ploidy.

type: integer

default: 2

Maximum depth coverage allowed for genotyping before down-sampling is turned on.

type: integer

default: 250

Specify VCF file for SNP annotation of output VCF files. Optional. Gzip not accepted.

type: string

Specify GATK output mode. Options: ‘EMIT_VARIANTS_ONLY’, ‘EMIT_ALL_CONFIDENT_SITES’, ‘EMIT_ALL_ACTIVE_SITES’.

type: string

Specify HaplotypeCaller mode for emitting reference confidence calls . Options: ‘NONE’, ‘BP_RESOLUTION’, ‘GVCF’.

type: string

Specify GATK output mode. Options: ‘EMIT_VARIANTS_ONLY’, ‘EMIT_ALL_CONFIDENT_SITES’, ‘EMIT_ALL_SITES’.

type: string

Specify UnifiedGenotyper likelihood model. Options: ‘SNP’, ‘INDEL’, ‘BOTH’, ‘GENERALPLOIDYSNP’, ‘GENERALPLOIDYINDEL’.

type: string

Specify to keep the BAM output of re-alignment around variants from GATK UnifiedGenotyper.

type: string

Supply a default base quality if a read is missing a base quality score. Setting to -1 turns this off.

type: string

Specify minimum required supporting observations to consider a variant.

type: integer

default: 1

Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified in —freebayes_C.

type: integer

default: 0

Specify ploidy of sample in FreeBayes.

type: integer

default: 2

Specify path to SNP panel in bed format for pileupCaller.

type: string

Specify path to SNP panel in EIGENSTRAT format for pileupCaller.

type: string

Specify calling method to use. Options: ‘randomHaploid’, ‘randomDiploid’, ‘majorityCall’.

type: string

Specify the calling mode for transitions. Options: ‘AllSites’, ‘TransitionsMissing’, ‘SkipTransitions’.

type: string

Specify which ANGSD genotyping likelihood model to use. Options: ‘samtools’, ‘gatk’, ‘soapsnp’, ‘syk’.

type: string

Specify which output type to output ANGSD genotyping likelihood results: Options: ‘text’, ‘binary’, ‘binary_three’, ‘beagle’.

type: string

Turn on creation of FASTA from ANGSD genotyping likelihood.

type: boolean

Specify which genotype type of ‘base calling’ to use for ANGSD FASTA generation. Options: ‘random’, ‘common’.

type: string

Options for creation of a per-sample FASTA sequence useful for downstream analysis (e.g. multi sequence alignment)

Turns on ability to create a consensus sequence FASTA file based on a UnifiedGenotyper VCF file and the original reference (only considers SNPs).

type: boolean

Specify name of the output FASTA file containing the consensus sequence. Do not include .vcf in the file name.

type: string

Specify the header name of the consensus sequence entry within the FASTA file.

type: string

Minimum depth coverage required for a call to be included (else N will be called).

type: integer

default: 5

Minimum genotyping quality of a call to be called. Else N will be called.

type: integer

default: 30

Minimum fraction of reads supporting a call to be included. Else N will be called.

type: number

default: 0.8

Options for creation of a SNP table useful for downstream analysis (e.g. estimation of cross-mapping of different species and multi-sequence alignment)

Turn on MultiVCFAnalyzer. Note: This currently only supports diploid GATK UnifiedGenotyper input.

type: boolean

Turn on writing write allele frequencies in the SNP table.

type: boolean

Specify the minimum genotyping quality threshold for a SNP to be called.

type: integer

default: 30

Specify the minimum number of reads a position needs to be covered to be considered for base calling.

type: integer

default: 5

Specify the minimum allele frequency that a base requires to be considered a ‘homozygous’ call.

type: number

default: 0.9

Specify the minimum allele frequency that a base requires to be considered a ‘heterozygous’ call.

type: number

default: 0.9

Specify paths to additional pre-made VCF files to be included in the SNP table generation. Use wildcard(s) for multiple files.

type: string

Specify path to the reference genome annotations in ‘.gff’ format. Optional.

type: string

default: NA

Specify path to the positions to be excluded in ‘.gff’ format. Optional.

type: string

default: NA

Specify path to the output file from SNP effect analysis in ‘.txt’ format. Optional.

type: string

default: NA

Options for the calculation of ratio of reads to one chromosome/FASTA entry against all others.

Turn on mitochondrial to nuclear ratio calculation.

type: boolean

Specify the name of the reference FASTA entry corresponding to the mitochondrial genome (up to the first space).

type: string

default: MT

Options for the calculation of biological sex of human individuals.

Turn on sex determination for human reference genomes.

type: boolean

Specify path to SNP panel in bed format for error bar calculation. Optional (see documentation).

type: string

Options for the estimation of contamination of human DNA.

Turn on nuclear contamination estimation for human reference genomes.

type: boolean

The name of the X chromosome in your bam/FASTA header. ‘X’ for hs37d5, ‘chrX’ for HG19.

type: string

default: X

Options for metagenomic screening of off-target reads.

Turn on metagenomic screening module for reference-unmapped reads.

type: boolean

Specify which classifier to use. Options: ‘malt’, ‘kraken’.

type: string

default: undefined

Specify path to classifier database directory. For Kraken2 this can also be a .tar.gz of the directory.

type: string

Specify a minimum number of reads a taxon of sample total is required to have to be retained. Not compatible with —malt_min_support_mode ‘percent’.

type: integer

default: 1

Percent identity value threshold for MALT.

type: integer

default: 85

Specify which alignment mode to use for MALT. Options: ‘Unknown’, ‘BlastN’, ‘BlastP’, ‘BlastX’, ‘Classifier’.

type: string

Specify alignment method for MALT. Options: ‘Local’, ‘SemiGlobal’.

type: string

Specify the percent for LCA algorithm for MALT (see MEGAN6 CE manual).

type: integer

default: 1

Specify whether to use percent or raw number of reads for minimum support required for taxon to be retained for MALT. Options: ‘percent’, ‘reads’.

type: string

Specify the minimum percentage of reads a taxon of sample total is required to have to be retained for MALT.

type: number

default: 0.01

Specify the maximum number of queries a read can have for MALT.

type: integer

default: 100

Specify the memory load method. Do not use ‘map’ with GPFS file systems for MALT as can be very slow. Options: ‘load’, ‘page’, ‘map’.

type: string

Specify to also produce SAM alignment files. Note this includes both aligned and unaligned reads, and are gzipped. Note this will result in very large file sizes.

type: boolean

Options for authentication of metagenomic screening performed by MALT.

Turn on MaltExtract for MALT aDNA characteristics authentication.

type: boolean

Path to a text file with taxa of interest (one taxon per row, NCBI taxonomy name format)

type: string

Path to directory containing containing NCBI resource files (ncbi.tre and ncbi.map; available: https://github.com/rhuebler/HOPS/)

type: string

Specify which MaltExtract filter to use. Options: ‘def_anc’, ‘ancient’, ‘default’, ‘crawl’, ‘scan’, ‘srna’, ‘assignment’.

type: string

Specify percent of top alignments to use.

type: number

default: 0.01

Turn off destacking.

type: boolean

Turn off downsampling.

type: boolean

Turn off duplicate removal.

type: boolean

Turn on exporting alignments of hits in BLAST format.

type: boolean

Turn on export of MEGAN summary files.

type: boolean

Minimum percent identity alignments are required to have to be reported. Recommended to set same as MALT parameter.

type: number

default: 85

Turn on using top alignments per read after filtering.

type: boolean

nf-core/eager