nf-core/coproid

A fully reproducible pipeline for COPROlite and paleofeces host IDentification

GitHub Actions CI Status GitHub Actions Linting Status Nextflow install with bioconda Docker Singularity Container available Documentation Status DOI Joins us on Slack Published in PeerJ

CoproID helps you to identify the “true maker” of Illumina sequenced Coprolites/Paleofaeces by checking the microbiome composition and the endogenous DNA.

It combines the analysis of putative host ancient DNA with a machine learning prediction of the feces source based on microbiome taxonomic composition:

  • (A) First coproID performs a comparative mapping of all reads agains two (or three) target genomes (genome1, genome2, and eventually genome3) and computes a host-DNA species ratio (NormalizedRatio)
  • (B) Then coproID performs a metagenomic taxonomic profiling, and compares the obtained profiles to modern reference samples of the target species metagenomes. Using machine learning, coproID then estimates the host source from the metagenomic taxonomic composition (prop_microbiome).
  • Finally, coproID combines A and B to predict the likely host of the metagenomic sample.

The coproID pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

A detailed description of coproID can be found in the article published in PeerJ.

Quick Start

i. Install nextflow

ii. Install either Docker or Singularity for full pipeline reproducibility (please only use Conda as a last resort; see docs)

iii. Download the pipeline and test it on a minimal dataset with a single command

nextflow run nf-core/coproid -profile test,<docker/singularity/conda/institute>

Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile institute in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.

iv. Start running your own analysis!

nextflow run maxibor/coproid --genome1 'GRCh37' --genome2 'CanFam3.1' --name1 'Homo_sapiens' --name2 'Canis_familiaris' --reads '*_R{1,2}.fastq.gz' --krakendb 'path/to/minikraken_db' -profile docker

This command runs coproID to estimate whether the source of test samples (--reads '*_R{1,2}.fastq.gz') are coming from a human (--genome1 'GRCh37' -name1 'Homo_sapiens') or a dog (--genome2 'CanFam3.1' --name2 'Canis_familiaris'), and specifies the path to the minikraken database (--krakendb 'path/to/minikraken_db').

NB: The example above assumes access to iGenomes.

See usage docs for all of the available options when running the pipeline.

Documentation

The nf-core/coproid pipeline comes with documentation about the pipeline, found in the docs/ directory:

The nf-core/coproid pipeline comes with documentation about the pipeline, found in the docs/ directory and at the following address: coproid.readthedocs.io

  1. Installation
  2. Pipeline configuration
  3. Running the pipeline
  4. Output and how to interpret the results
  5. Troubleshooting

Credits

nf-core/coproid was written by Maxime Borry.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don’t hesitate to get in touch on Slack (you can join with this invite).

Citing

coproID has been published in peerJ. The bibtex citation is available below:

@article{borry_coproid_2020,
 title = {{CoproID} predicts the source of coprolites and paleofeces using microbiome composition and host {DNA} content},
 volume = {8},
 issn = {2167-8359},
 url = {https://peerj.com/articles/9001},
 doi = {10.7717/peerj.9001},
 language = {en},
 urldate = {2020-04-20},
 journal = {PeerJ},
 author = {Borry, Maxime and Cordova, Bryan and Perri, Angela and Wibowo, Marsha and Honap, Tanvi Prasad and Ko, Jada and Yu, Jie and Britton, Kate and Girdland-Flink, Linus and Power, Robert C. and Stuijts, Ingelise and Salazar-García, Domingo C. and Hofman, Courtney and Hagan, Richard and Kagoné, Thérèse Samdapawindé and Meda, Nicolas and Carabin, Helene and Jacobson, David and Reinhard, Karl and Lewis, Cecil and Kostic, Aleksandar and Jeong, Choongwon and Herbig, Alexander and Hübner, Alexander and Warinner, Christina},
 month = apr,
 year = {2020},
 note = {Publisher: PeerJ Inc.},
 pages = {e9001}
}

Contributors

James A. Fellows Yates

Tool references