Introduction

nf-core/pacsomatic is a bioinformatics best-practice pipeline for somatic variant analysis using PacBio HiFi sequencing data from matched tumor-normal samples.

The pipeline performs comprehensive somatic analysis including:

  • Variant calling: SNVs, indels, structural variants (SVs), and copy number variants (CNVs)
  • Variant annotation: Functional annotation and mutation signatures
  • Methylation analysis: CpG methylation calling and differential methylation region (DMR) detection
  • Tumor characterization: Clonality, purity, ploidy, and homologous recombination deficiency (HRD) analysis

The pipeline is built using Nextflow, a workflow manager to run tasks across multiple compute infrastructures in a portable, reproducible manner. It is designed following the nf-core community’s best practices and utilizes containerization with Docker, Singularity, or Conda for dependency management.

nf-core/pacsomatic workflow overview

Pipeline Overview

The pipeline performs the following steps:

1. Alignment and Quality Control

2. Variant Calling

3. Variant Annotation and Filtering

4. Methylation Analysis

5. Tumor Characterization

  • Homologous recombination deficiency estimation (CHORD)
  • Tumor purity and ploidy analysis (AMBER, COBALT, PURPLE)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

Minimal samplesheet format (samplesheet.csv):

patient,sample,status,bam
ID1,S1_tumor,1,/path/to/ID1_S1_tumor.bam
ID1,S1_normal,0,/path/to/ID1_S1_normal.bam
ID2,S2_tumor,1,/path/to/ID2_S2_tumor.bam
ID2,S2_normal,0,/path/to/ID2_S2_normal.bam

Extended samplesheet with PBI index files:

patient,sample,status,bam,pbi
ID1,S1_tumor,1,/path/to/ID1_S1_tumor.bam,/path/to/ID1_S1_tumor.bam.pbi
ID1,S1_normal,0,/path/to/ID1_S1_normal.bam,/path/to/ID1_S1_normal.bam.pbi

Column descriptions:

  • patient: Unique patient identifier (samples with the same ID are treated as matched pairs)
  • sample: Unique sample identifier
  • status: Sample type (1 = tumor, 0 = normal)
  • bam: Full path to unaligned BAM file
  • pbi: (Optional) Full path to PacBio index (.pbi) file

Now, you can run the pipeline using:

nextflow run nf-core/pacsomatic \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --genome GRCh38
Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline Output

Results are organized into functionally grouped subdirectories:

results/
β”œβ”€β”€ alignment/                 # Aligned BAMs and QC metrics
β”‚   β”œβ”€β”€ pbmm2/                # Aligned BAM files
β”‚   └── qc/                   # Alignment quality control
β”œβ”€β”€ germline_snv/             # Germline variants and phasing
β”‚   β”œβ”€β”€ clair3/              # Germline SNV/indel calls
β”‚   └── hiphase/             # Phased germline variants
β”œβ”€β”€ somatic_snv/              # Somatic SNV/indel analysis
β”‚   β”œβ”€β”€ deepsomatic/         # Somatic variant calls
β”‚   β”œβ”€β”€ vep_annot/           # VEP annotations
β”‚   └── hiphase_somatic/     # Phased somatic variants
β”œβ”€β”€ somatic_sv/               # Structural variant analysis
β”‚   β”œβ”€β”€ severus/             # SV calls
β”‚   β”œβ”€β”€ svpack/              # Filtered SVs
β”‚   └── annotsv_annot/       # SV annotations
β”œβ”€β”€ somatic_cnv/              # Copy number variants
β”‚   └── cnvkit/              # CNVkit results
β”œβ”€β”€ methylation/              # Methylation analysis
β”‚   β”œβ”€β”€ pb_cpg_tools/        # CpG methylation scores
β”‚   β”œβ”€β”€ dss_dmr/             # Differential methylation regions
β”‚   └── dmr_annot/           # DMR annotations
β”œβ”€β”€ tumor_clonality/          # Tumor purity and ploidy
β”‚   β”œβ”€β”€ amber/               # BAF analysis
β”‚   β”œβ”€β”€ cobalt/              # Read depth ratios
β”‚   └── purple/              # Purity/ploidy estimation
β”œβ”€β”€ signature_analysis/       # Mutational signatures and HRD
β”‚   β”œβ”€β”€ mutationalpattern/   # Mutation signatures
β”‚   └── chord/               # HRD estimation
β”œβ”€β”€ pipeline_info/            # Pipeline execution reports
└── multiqc/                  # Aggregated QC report

For detailed descriptions of output files, see the output documentation.

To view example results from a full-size test dataset, visit the results page on the nf-core website.

Credits

nf-core/pacsomatic was originally written by Wenchao Zhang (@wzhang42) and Haidong Yi (@haidyi).

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don’t hesitate to get in touch on the Slack #pacsomatic channel (you can join with this invite).

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.