command
» nextflow run nf-core/mnaseseq -profile test
clones in last 3 months

7

stars

0

watchers

0

last release

N/A

last updated

4 months ago

open issues

0

pull requests

4

collaborators

get in touch

ask a question on Slack

open an issue on GitHub

Introduction

nfcore/mnaseseq is a bioinformatics analysis pipeline used for DNA sequencing data obtained via micrococcal nuclease digestion.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Pipeline summary

  1. Raw read QC (FastQC)
  2. Adapter trimming (Trim Galore!)
  3. Alignment (BWA)
  4. Mark duplicates (picard)
  5. Merge alignments from multiple libraries of the same sample (picard)
    1. Re-mark duplicates (picard)
    2. Filtering to remove:
      • reads mapping to blacklisted regions (SAMtools, BEDTools)
      • reads that are marked as duplicates (SAMtools)
      • reads that arent marked as primary alignments (SAMtools)
      • reads that are unmapped (SAMtools)
      • reads that map to multiple locations (SAMtools)
      • reads containing > 4 mismatches (BAMTools)
      • reads that are soft-clipped (BAMTools)
      • reads that have an insert size within specified range (BAMTools; paired-end only)
      • reads that map to different chromosomes (Pysam; paired-end only)
      • reads that arent in FR orientation (Pysam; paired-end only)
      • reads where only one read of the pair fails the above criteria (Pysam; paired-end only)
    3. Alignment-level QC and estimation of library complexity (picard, Preseq)
    4. Create normalised bigWig files scaled to 1 million mapped reads (BEDTools, bedGraphToBigWig)
    5. Calculate genome-wide coverage assessment (deepTools)
    6. Call nucleosome positions and generate smoothed, normalised coverage bigWig files that can be used to generate occupancy profile plots between samples across features of interest (DANPOS2)
    7. Generate gene-body meta-profile from DANPOS2 smoothed bigWig files (deepTools)
  6. Create IGV session file containing bigWig tracks for data visualisation (IGV).
  7. Present QC for raw read and alignment results (MultiQC)

Quick Start

i. Install nextflow

ii. Install one of docker, singularity or conda

iii. Download the pipeline and test it on a minimal dataset with a single command

nextflow run nf-core/mnaseseq -profile test,<docker/singularity/conda>

iv. Start running your own analysis!

nextflow run nf-core/mnaseseq -profile <docker/singularity/conda> --design design.csv --genome GRCh37

See usage docs for all of the available options when running the pipeline.

Documentation

The nf-core/mnaseseq pipeline comes with documentation about the pipeline, found in the docs/ directory:

  1. Installation
  2. Pipeline configuration
  3. Running the pipeline
  4. Output and how to interpret the results
  5. Troubleshooting

Credits

The pipeline was originally written by the The Bioinformatics & Biostatistics Group for use at The Francis Crick Institute, London.

The pipeline was developed by Harshil Patel.

Many thanks to others who have helped out along the way too, including (but not limited to): @crickbabs.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on Slack (you can join with this invite).

Citation

You can cite the nf-core pre-print as follows:

Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. nf-core: Community curated bioinformatics pipelines. bioRxiv. 2019. p. 610741. doi: 10.1101/610741.