nf-core/eager
Edit

A fully reproducible and state-of-the-art ancient DNA analysis pipeline

adnaancient-dna-analysisancientdnagenomemetagenomicspathogen-genomicspopulation-genetics

These pages are for an old version of the pipeline (2.0.5). The latest stable release is 2.5.3 .

This pipeline uses DSL1. It will not work with Nextflow versions after 22.10.6. Learn more.

Launch version 2.0.5 https://github.com/nf-core/eager

Introduction

nf-core/eager is a bioinformatics best-practice analysis pipeline for NGS sequencing based ancient DNA (aDNA) data analysis.

The pipeline uses Nextflow, a bioinformatics workflow tool. It pre-processes raw data from FASTQ inputs, aligns the reads and performs extensive general NGS and aDNA specific quality-control on the results. It comes with docker, singularity or conda containers making installation trivial and results highly reproducible.

Pipeline steps

By default the pipeline currently performs the following:

Create reference genome indices for mapping (bwa, samtools, and picard)
Sequencing quality control (FastQC)
Sequencing adapter removal and for paired end data merging (AdapterRemoval)
Read mapping to reference using (bwa aln, bwa mem or CircularMapper)
Post-mapping processing, statistics and conversion to bam (samtools)
Ancient DNA C-to-T damage pattern visualisation (DamageProfiler)
PCR duplicate removal (DeDup or MarkDuplicates)
Post-mapping statistics and BAM quality control (Qualimap)
Library Complexity Estimation (preseq)
Overall pipeline statistics summaries (MultiQC)

Additional functionality contained by the pipeline currently includes:

Illumina two-coloured sequencer poly-G tail removal (fastp)
Automatic conversion of unmapped reads to FASTQ (samtools)
Damage removal/clipping for UDG+/UDG-half treatment protocols (BamUtil)
Damage reads extraction and assessment (PMDTools)

Quick Start

Install nextflow
Install one of docker, singularity or conda
Download the EAGER pipeline

nextflow pull nf-core/eager

Set up your job with default parameters

nextflow run nf-core -profile <docker/singularity/conda> --reads'*_R{1,2}.fastq.gz' --fasta '<REFERENCE>.fasta'

See the overview of the run with under <OUTPUT_DIR>/MultiQC/multiqc_report.html

Modifications to the default pipeline are easily made using various options as described in the documentation.

Documentation

The nf-core/eager pipeline comes with documentation about the pipeline, found in the docs/ directory:

Installation
Pipeline configuration
- Local installation
- Adding your own system
Running the pipeline
Output and how to interpret the results
Troubleshooting

Credits

This pipeline was written by Alexander Peltzer (apeltzer), with major contributions from Stephen Clayton, ideas and documentation from James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to contribute, please open an issue and ask to be added to the project - happy to do so and everyone is welcome to contribute here!

Tool References

EAGER v1, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. https://doi.org/10.1186/s13059-016-0918-z Download: https://github.com/apeltzer/EAGER-GUI and https://github.com/apeltzer/EAGER-CLI
FastQC download: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
AdapterRemoval v2 Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. https://doi.org/10.1186/s13104-016-1900-2 Download: https://github.com/MikkelSchubert/adapterremoval
bwa Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25(14), 1754–1760. https://doi.org/10.1093/bioinformatics/btp324 Download: http://bio-bwa.sourceforge.net/bwa.shtml
SAMtools Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 Download: http://www.htslib.org/
DamageProfiler Judith Neukamm (Unpublished)
QualiMap Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. https://doi.org/10.1093/bioinformatics/btv566 Download: http://qualimap.bioinfo.cipf.es/
preseq Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature Methods, 10(4), 325–327. https://doi.org/10.1038/nmeth.2375. Download: http://smithlabresearch.org/software/preseq/
PMDTools Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. https://doi.org/10.1073/pnas.1318934111 Download: https://github.com/pontussk/PMDtools
MultiQC Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354 Download: https://multiqc.info/
BamUtils Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. https://doi.org/10.1101/gr.176552.114 Download: https://genome.sph.umich.edu/wiki/BamUtil
FastP Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. https://doi.org/10.1093/bioinformatics/bty560 Download: https://github.com/OpenGene/fastp