nf-core/chipseq
ChIP-seq peak-calling, QC and differential analysis pipeline.
1.0.0
). The latest
stable release is
2.1.0
.
Introduction
nfcore/chipseq is a bioinformatics analysis pipeline used for Chromatin ImmunopreciPitation sequencing (ChIP-seq) data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
Pipeline summary
- Raw read QC (
FastQC
) - Adapter trimming (
Trim Galore!
) - Alignment (
BWA
) - Mark duplicates (
picard
) - Merge alignments from multiple libraries of the same sample (
picard
)- Re-mark duplicates (
picard
) - Filtering to remove:
- reads mapping to blacklisted regions (
SAMtools
,BEDTools
) - reads that are marked as duplicates (
SAMtools
) - reads that arent marked as primary alignments (
SAMtools
) - reads that are unmapped (
SAMtools
) - reads that map to multiple locations (
SAMtools
) - reads containing > 4 mismatches (
BAMTools
) - reads that have an insert size > 2kb (
BAMTools
; paired-end only) - reads that map to different chromosomes (
Pysam
; paired-end only) - reads that arent in FR orientation (
Pysam
; paired-end only) - reads where only one read of the pair fails the above criteria (
Pysam
; paired-end only)
- reads mapping to blacklisted regions (
- Alignment-level QC and estimation of library complexity (
picard
,Preseq
) - Create normalised bigWig files scaled to 1 million mapped reads (
BEDTools
,bedGraphToBigWig
) - Generate gene-body meta-profile from bigWig files (
deepTools
) - Calculate genome-wide IP enrichment relative to control (
deepTools
) - Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC (
phantompeakqualtools
) - Call broad/narrow peaks (
MACS2
) - Annotate peaks relative to gene features (
HOMER
) - Create consensus peakset across all samples and create tabular file to aid in the filtering of the data (
BEDTools
) - Count reads in consensus peaks (
featureCounts
) - Differential binding analysis, PCA and clustering (
R
,DESeq2
)
- Re-mark duplicates (
- Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation (
IGV
). - Present QC for raw read, alignment, peak-calling and differential binding results (
MultiQC
,R
)
Documentation
The nf-core/chipseq pipeline comes with documentation about the pipeline, found in the docs/
directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
Credits
These scripts were orginally written by Chuan Wang (@chuan-wang) and Phil Ewels (@ewels) for use at the National Genomics Infrastructure at SciLifeLab in Stockholm, Sweden. It has since been re-implemented by Harshil Patel (@drpatelh) from The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
Many thanks to others who have helped out along the way too, including (but not limited to): @apeltzer, @bc2zb, @drejom, @KevinMenden, @pditommaso.
Citation
You can cite the nf-core
pre-print as follows:
Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. nf-core: Community curated bioinformatics pipelines. bioRxiv. 2019. p. 610741. doi: 10.1101/610741.