The nf-core/crisprseq pipeline allows the analysis of CRISPR edited DNA. It evaluates the quality of gene editing experiments using targeted next generation sequencing (NGS) data.
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 6 columns, and a header row as shown in the examples below.
Multiple runs of the same sample
sample identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes (see section below for an explanation of samplesheet columns):
The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 6 columns to match those defined in the table below.
A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 3 samples, where
chr6 is single-end and has a template sequence (this is a reduced samplesheet, please refer to the pipeline example saplesheet to see the full version).
|Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (
|Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension “.fastq.gz” or “.fq.gz”.
|Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension “.fastq.gz” or “.fq.gz”. (Optional)
|Reference sequence of the target region.
|Sequence of the protospacer used for CRISPR editing. Must not includ the PAM.
|Sequence of the template used in templet-based editing experiments. (Optional)
An example samplesheet has been provided with the pipeline.
Optional pipeline steps
Trimming of overrepresented sequences
To trim the overrepresented sequences found with FastQC from the reads, use the parameter
Such sequences are not trimmed by default.
When using the
--overrepresented parameter, Cutadapt is used to trim overrepresented sequences from the input FASTQ files.
If the provided samples were sequenced using umi-molecular identifiers (UMIs), use the parameter
--umi_clustering in order to run the clustering steps.
- Extract UMI sequences (Python script)
- Cluster UMI sequences (
- Obtain the most abundant UMI sequence for each cluster (
- Obtain a consensus for each cluster (
- Polish consensus sequence (
- Repeat a second round of consensus + polishing (
- Obtain the final consensus of each cluster (Medaka)
Other input parameters
If you want to provide the same reference for every sample, you can select a genome with
--genome or provide a reference FASTA file with
Using any of these two parameters will override any reference sequence provided through an input sample sheet.
Please refer to the nf-core website for general usage docs and guidelines regarding reference genomes.
If you want to provide the same protospacer sequence for every sample, you can provide the sequence with the parameter
Using this parameter will override any protospacer sequence provided through an input sample sheet.
Providing a protospacer, either through a sample sheet or by using the parameter
--protospacer is required.
By default, the pipeline uses
--aligner minimap2) to map the sequenced FASTQ reads to the reference.
You also have the option to select other alignment tools by using the parameter
--alignment. Possible options are
The default alignment with
minimap2 uses adapted parameters which were seen to improve the alignment and reduce potential sequencing or alignment errors.
The default parameters are:
- A matching score of 29
- A mismatching penalty of 17
- A gap open penalty of 25
- A gap extension penalty of 2.
Please refer to the original CRISPR-Analytics publication to see the benchmarking of such parameters.
In order to customise such parameters, you can override the arguments given to
minimap2 by creating a configuration file and provide it to your nextflow run with
Running the pipeline
The typical command for running the pipeline is as follows:
This will launch the pipeline with the
docker configuration profile. See below for more information about profiles.
Note that the pipeline will create the following files in your working directory: