nf-core/deepmutscan
Edit

nf-core/deepmutscan is a reproducible, scalable, and community-curated pipeline for analyzing deep mutational scanning (DMS) data using shotgun DNA sequencing.

community-workflowdeep-mutational-scanninggenotype-phenotypeshotgun-sequencing

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/deepmutscan

Introduction

nf-core/deepmutscan is a workflow designed for the analysis of deep mutational scanning (DMS) data. DMS enables researchers to experimentally measure the fitness effects of thousands of genes or gene variants simultaneously, helping to classify disease causing mutants in human and animal populations, to learn the fundamental rules of protein architecture, small-molecule binding, mRNA splicing, viral evolution and many other quantifiable phenotypes.

While DNA synthesis and sequencing technologies have advanced substantially, long open reading frame (ORF) targets still present a major challenge for DMS studies. Shotgun DNA sequencing can be used to greatly speed up the inference of long ORF mutant fitness landscapes, theoretically at no expense in accuracy. We have designed the nf-core/deepmutscan pipeline to unlock the power of shotgun sequencing based DMS studies on long ORFs, to simplify and standardise the complex bioinformatics steps involved in data processing of such experiments – from read alignment to QC reporting and fitness landscape inferences.

nf-core/deepmutscan workflow

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

Major features

End-to-end analyses of various DMS data
Modular, three-stage workflow: alignment → QC → error-aware fitness estimation
Integration with popular statistical fitness estimation tools like DiMSum, Enrich2, rosace and mutscan
Support of multiple mutagenesis strategies, e.g. by nicking with degenerate NNK and NNS codons
Containerisation via Docker, Singularity and Apptainer
Scalability across HPC and Cloud systems
Monitoring of CPU, memory, and CO₂ usage

For more details on the pipeline and on potential future expansions, please consider reading our usage description.

Step-by-step pipeline summary

The pipeline processes deep mutational scanning (DMS) sequencing data in several stages:

Alignment of reads to the reference open reading frame (ORF) (BWA-mem)
Filtering of wildtype and erroneous reads (samtools view)
Read merging for base error reduction (vsearch merge, BWA-mem)
Mutation counting (GATK AnalyzeSaturationMutagenesis)
DMS library quality control
Data summarisation across samples
Single nucleotide variant error correction (in development)
Fitness estimation

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input/output data in which each row represents a pair of fastq files (paired end). This should look as follows:

samplesheet.csv

sample,type,replicate,file1,file2
ORF1,input,1,/reads/forward1.fastq.gz,/reads/reverse1.fastq.gz
ORF1,input,2,/reads/forward2.fastq.gz,/reads/reverse2.fastq.gz
ORF1,output,1,/reads/forward3.fastq.gz,/reads/reverse3.fastq.gz
ORF1,output,2,/reads/forward4.fastq.gz,/reads/reverse4.fastq.gz

Secondly, specify the gene or gene region of interest using a reference FASTA file via --fasta. Provide the exact codon coordinates using --reading_frame.

Now, you can run the pipeline using:

example_run.sh

nextflow run nf-core/deepmutscan \
   -profile <docker/singularity/.../institute> \
   --input ./samplesheet.csv \
   --fasta ./ref.fa \
   --reading_frame 1-300 \
   --outdir ./results

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.

For more details about the output files and reports, please refer to the output documentation.

Contributing

We welcome contributions from the community!

For technical challenges and feedback on the pipeline, please use our Github repository. Please open an issue or pull request to:

Report bugs or solve data incompatibilities when running nf-core/deepmutscan
Suggest the implementation of new modules for custom DMS workflows
Help improve this documentation

If you are interested in getting involved as a developer, please consider joining our interactive #deepmutscan Slack channel (via this invite).

Credits

nf-core/deepmutscan was originally written by Benjamin Wehnert and Max Stammnitz at the Centre for Genomic Regulation, Barcelona, with the generous support of an EMBO Long-term Postdoctoral Fellowship and a Marie Skłodowska-Curie grant by the European Union.

If you use nf-core/deepmutscan in your analyses, please cite:

📄 Wehnert et al., bioRxiv preprint (coming soon)

Please also cite the nf-core framework:

📄 Ewels et al., Nature Biotechnology, 2020 https://doi.org/10.1038/s41587-020-0439-x

For further information or help, don’t hesitate to get in touch on the Slack #deepmutscan channel (you can join with this invite).

Scientific contact

For scientific discussions around the use of this pipeline (e.g. on experimental design or sequencing data requirements), please feel free to get in touch with us directly:

Benjamin Wehnert — wehnertbenjamin@gmail.com
Maximilian Stammnitz — maximilian.stammnitz@crg.eu

run with

See the docs on how to configure the Seqera Platform CLI.

subscribers

stars

open issues

open PRs

last release

not yet released

last update

25 days ago

included modules

included subworkflows

utils_nextflow_pipeline utils_nfcore_pipeline utils_nfschema_plugin

contributors

get help

Ask a question on Slack Open an issue on GitHub

nf-core/deepmutscan Edit

Introduction

Major features

Step-by-step pipeline summary

Usage

Pipeline output

Contributing

Credits

Scientific contact

run with

subscribers

stars

open issues

open PRs

last release

last update

included modules

included subworkflows

contributors

get help

nf-core/deepmutscan
Edit