22.10.6
.
Learn more.
A fully reproducible pipeline for COPROlite and paleofeces host IDentification
CoproID helps you to identify the “true maker” of Illumina sequenced Coprolites/Paleofaeces by checking the microbiome composition and the endogenous DNA.
It combines the analysis of putative host ancient DNA with a machine learning prediction of the feces source based on microbiome taxonomic composition:
- (A) First coproID performs a comparative mapping of all reads agains two (or three) target genomes (genome1, genome2, and eventually genome3) and computes a host-DNA species ratio (NormalizedRatio)
- (B) Then coproID performs a metagenomic taxonomic profiling, and compares the obtained profiles to modern reference samples of the target species metagenomes. Using machine learning, coproID then estimates the host source from the metagenomic taxonomic composition (prop_microbiome).
- Finally, coproID combines A and B to predict the likely host of the metagenomic sample.
The coproID pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
A detailed description of coproID can be found in the article published in PeerJ.
Quick Start
i. Install nextflow
ii. Install either Docker
or Singularity
for full pipeline reproducibility (please only use Conda
as a last resort; see docs)
iii. Download the pipeline and test it on a minimal dataset with a single command
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile institute
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment.
iv. Start running your own analysis!
This command runs coproID to estimate whether the source of test samples (--reads '*_R{1,2}.fastq.gz'
) are coming from a human (--genome1 'GRCh37' -name1 'Homo_sapiens'
) or a dog (--genome2 'CanFam3.1' --name2 'Canis_familiaris'
), and specifies the path to the minikraken database (--krakendb 'path/to/minikraken_db'
).
NB: The example above assumes access to iGenomes.
See usage docs for all of the available options when running the pipeline.
Documentation
The nf-core/coproid pipeline comes with documentation about the pipeline, found in the docs/
directory:
The nf-core/coproid pipeline comes with documentation about the pipeline, found in the docs/
directory and at the following address: coproid.readthedocs.io
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
Credits
nf-core/coproid was written by Maxime Borry.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don’t hesitate to get in touch on Slack (you can join with this invite).
Citing
coproID has been published in peerJ. The bibtex citation is available below:
Contributors
Tool references
- AdapterRemoval v2 Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. https://doi.org/10.1186/s13104-016-1900-2
- FastQC https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- Bowtie2 Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357. https://dx.doi.org/10.1038%2Fnmeth.1923
- Samtools Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352
- Kraken2 Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. BioRxiv, 762302. https://doi.org/10.1101/762302
- PMDTools Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. https://doi.org/10.1073/pnas.1318934111
- DamageProfiler Judith Neukamm (Unpublished): 10.5281/zenodo.1064062
- Sourcepredict Borry, M. (2019). Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification. The Journal of Open Source Software. https://doi.org/10.21105/joss.01540
- MultiQC Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354