Introduction

nfcore/nanoseq is a bioinformatics analysis pipeline for Nanopore DNA/RNA sequencing data that can be used to perform basecalling, demultiplexing, QC, alignment, and downstream analysis.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset obtained from the Singapore Nanopore Expression Consortium on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

Pipeline Summary

  1. Demultiplexing (qcat; optional)
  2. Raw read cleaning (NanoLyse; optional)
  3. Raw read QC (NanoPlot, FastQC)
  4. Alignment (GraphMap2 or minimap2)
    • Both aligners are capable of performing unspliced and spliced alignment. Sensible defaults will be applied automatically based on a combination of the input data and user-specified parameters
    • Each sample can be mapped to its own reference genome if multiplexed in this way
    • Convert SAM to co-ordinate sorted BAM and obtain mapping metrics (samtools)
  5. Create bigWig (BEDTools, bedGraphToBigWig) and bigBed (BEDTools, bedToBigBed) coverage tracks for visualisation
  6. DNA specific downstream analysis:
  7. RNA specific downstream analysis:
    • Transcript reconstruction and quantification (bambu or StringTie2)
      • bambu performs both transcript reconstruction and quantification
      • When StringTie2 is chosen, each sample can be processed individually and combined. After which, featureCounts will be used for both gene and transcript quantification.
    • Differential expression analysis (DESeq2 and/or DEXSeq)
    • RNA modification detection (xpore and/or m6anet)
    • RNA fusion detection (JAFFAL)
  8. Present QC for raw read and alignment results (MultiQC)

Functionality Overview

A graphical overview of suggested routes through the pipeline depending on the desired output can be seen below.

nf-core/nanoseq metro map

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.