Introduction
This document describes the output produced by the nf-core/pacsomatic pipeline. Results will be stored in the directory specified during pipeline execution with the parameter --outdir <OUTDIR>.
After the pipeline execution finishes, the output directory will contain several organized subdirectories, described below. All paths mentioned here are relative to the top-level output directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
- PBMM2 - Align samples to reference genome
- Alignment QC - Calculate the QC metrics of each alignment
- BAM_COVERAGE - Generate the coverage tracks for each alignment
- MOSDEPTH - Calculate BAM depth for each alignment
- BAM_SORT_STATS_SAMTOOLS - Use SAMtools sort/index/stats/flagstat for each alignment
- Methylation Detection and Annotation - Calculate CpG methylation for each alignment and detect and annotate the Differential Methylation Regions for tumor-normal pair
- Clair3 - Germline SNV-INDEL variant calling for all samples
- HiPhase - Phase VCF and BAM files for normal samples
- HiPhase Somatic - Phase VCF and BAM files for tumor samples
- PBCPGTOOLS_ALIGNEDBAMTOCPGSCORES - Generate site methylation probabilities from mapped and phased BAM file
- DSS_DMR - Use DSS (Dispersion Shrinkage for Sequencing) to detect DMR (Differential Methylation Region)
- DMR_ANNOT - Annotate the detected DMRs
- Somatic CNV Calling - Somatic CNV calling
- CNVkit - Use CNVkit packages to infer and visualize somatic copy number variants
- Somatic SNV INDEL Calling - Somatic Variant call SNVs
- DeepSomatic - Use deepsomatic to call somatic SNV and INDELs
- VEP - Use VEP for annotating somatic SNVs
- Mutational Pattern - Use Mutational_Patterns for mutation signature analysis
- Somatic SV Calling - Somatic variant call SVs
- Homologous Recombination Deficiency Estimation - Utilize the called SNVs and SVs to estimate the Homologous Recombination Deficiency (HRD).
- CHORD - Use CHORD for HRD estimation
- Tumor purity and ploid estimation - Estimate tumor purity and ploid
- AMBER - Use hmftools-AMBER to analyze tumor-normal BAM pair to generate tumor BAF
- COBALT - Use hmftools-COBALT to analyze tumor-normal BAM pair to determine the read depth ratio of the tumor against reference
- PURPLE - Use hmftools-PURPLE to combine the BAF from AMBER and read depth ratio from COBALT to estimate tumor purity and ploid
- MultiQC - Aggregate report describing results and QC from the whole pipeline
- Pipeline information - Report metrics generated during the workflow execution
Output directory Structure
<OUTDIR>/
├── alignment/ # Aligned BAMs and QC metrics
│ ├── pbmm2/ # Aligned and sorted BAM files
│ └── qc/ # Alignment quality control
│ ├── samtools/ # SAMtools statistics
│ ├── bamcoverage/ # Coverage tracks (bigWig files)
│ └── mosdepth/ # Depth coverage analysis
├── genome/ # Reference genome files
├── germline_snv/ # Germline variant calling and phasing
│ ├── clair3/ # Germline SNV/indel calls
│ └── hiphase/ # Phased germline variants and BAMs
├── somatic_snv/ # Somatic SNV/indel analysis
│ ├── deepsomatic/ # Somatic variant calls
│ ├── vep_annot/ # VEP annotations
│ └── hiphase_somatic/ # Phased somatic variants
├── somatic_sv/ # Somatic structural variant analysis
│ ├── severus/ # SV calls from Severus
│ ├── svpack/ # Filtered and annotated SVs
│ └── annotsv_annot/ # AnnotSV annotations
├── somatic_cnv/ # Somatic copy number variants
│ └── cnvkit/ # CNVkit results
├── methylation/ # Methylation analysis
│ ├── pb_cpg_tools/ # CpG methylation scores
│ ├── dss_dmr/ # Differential methylation regions
│ └── dmr_annot/ # DMR annotations
├── tumor_clonality/ # Tumor purity and ploidy
│ ├── amber/ # BAF analysis
│ ├── cobalt/ # Read depth ratios
│ └── purple/ # Purity and ploidy estimation
├── signature_analysis/ # Mutational signatures and HRD
│ ├── mutationalpattern/# Mutation signatures
│ └── chord/ # HRD estimation
├── multiqc/ # Aggregated QC report
└── pipeline_info/ # Pipeline execution reportsPBMM2
Output files
alignment/pbmm2/<sample_id>/<sample_id>.aligned.bam: Aligned and sorted BAM file<sample_id>.aligned.bam.bai: BAM index file
PBMM2 aligns PacBio HiFi reads to the reference genome using minimap2.
Alignment QC
BAM_COVERAGE
Output files
alignment/qc/bamcoverage/<sample_id>/<sample_id>.bigWig: Coverage track in bigWig format
bamCoverage generates coverage tracks for visualization in genome browsers.
MOSDEPTH
Output files
alignment/qc/mosdepth/<sample_id>/<sample_id>.mosdepth.global.dist.txt: Global cumulative distribution<sample_id>.mosdepth.summary.txt: Depth summary statistics<sample_id>.per-base.bed.gz: Per-base coverage<sample_id>.per-base.bed.gz.csi: Index file
mosdepth calculates depth coverage statistics for each alignment.
BAM_SORT_STATS_SAMTOOLS
Output files
alignment/qc/samtools/<sample_id>/<sample_id>.flagstat: Alignment flag statistics<sample_id>.idxstats: Index statistics<sample_id>.stats: Detailed alignment statistics
SAMtools generates comprehensive alignment quality metrics.
Methylation Detection and Annotation
Clair3
Output files
germline_snv/clair3/<sample_id>/<sample_id>_pileup.vcf.gz: Germline variant calls<sample_id>_pileup.vcf.gz.tbi: Tabix index
Clair3 performs germline SNV/indel calling for each sample.
HiPhase
Output files
germline_snv/hiphase/<sample_id>/<sample_id>.phased.bam: Phased BAM file<sample_id>.phased.bam.bai: BAM index<sample_id>.phased.vcf.gz: Phased variants<sample_id>.stats.csv: Phasing statistics
HiPhase phases germline variants and reads for normal samples.
HiPhase Somatic
Output files
somatic_snv/hiphase_somatic/<sample_id>/<sample_id>.phased.bam: Phased tumor BAM<sample_id>.phased.bam.bai: BAM index<sample_id>.germline_phased.vcf.gz: Phased germline variants<sample_id>.somatic_phased.vcf.gz: Phased somatic variants<sample_id>.stats.csv: Phasing statistics
HiPhase phases both germline and somatic variants in tumor samples.
PBCPGTOOLS_ALIGNEDBAMTOCPGSCORES
This analysis is performed separately for tumor and normal samples.
Output files
methylation/pb_cpg_tools/normal/<sample_id>/methylation/pb_cpg_tools/tumor/<sample_id>/<sample_id>.combined.bed.gz: Combined CpG scores (all reads)<sample_id>.combined.bed.gz.tbi: Index file<sample_id>.combined.bw: Coverage track (all reads)<sample_id>.hap1.bed.gz: CpG scores for haplotype 1<sample_id>.hap1.bed.gz.tbi: Index file<sample_id>.hap1.bw: Coverage track for haplotype 1<sample_id>.hap2.bed.gz: CpG scores for haplotype 2<sample_id>.hap2.bed.gz.tbi: Index file<sample_id>.hap2.bw: Coverage track for haplotype 2
pb-CpG-tools generates CpG methylation scores from phased BAM files.
DSS_DMR
Output files
methylation/dss_dmr/<patient_id>/<tumor_vs_normal>.dmr.tsv: Differential methylation regions
DSS detects differential methylation regions (DMRs) between tumor and normal samples.
DMR_ANNOT
Output files
methylation/dmr_annot/<patient_id>/<patient_id>_dmr_annotation_summary.tsv.gz: DMR annotation summary<patient_id>_hg38_genes_promoters_dmrs.tsv.gz: DMRs in gene promoters<patient_id>_hg38_genes_1to5kb_dmrs.tsv.gz: DMRs 1-5kb from genes<patient_id>_hg38_genes_5UTRs_dmrs.tsv.gz: DMRs in 5’UTRs<patient_id>_hg38_genes_exons_dmrs.tsv.gz: DMRs in exons<patient_id>_hg38_genes_introns_dmrs.tsv.gz: DMRs in introns<patient_id>_hg38_genes_3UTRs_dmrs.tsv.gz: DMRs in 3’UTRs
annotatr annotates DMRs with genomic features.
Somatic CNV Calling
CNVkit
Output files
somatic_cnv/cnvkit/batch/<patient_id>/<tumor_id>.cnr: Copy number ratios<tumor_id>.cns: Copy number segments<tumor_id>-diagram.pdf: Chromosome diagram<tumor_id>-scatter.png: Scatter plot
somatic_cnv/cnvkit/call/<patient_id>/<tumor_id>.call.cns: Called copy number segments
CNVkit infers and visualizes somatic copy number variants from tumor-normal pairs.
Somatic SNV INDEL Calling
DeepSomatic
Output files
somatic_snv/deepsomatic/<patient_id>/<tumor_vs_normal>.g.vcf.gz: GVCF with all sites<tumor_vs_normal>.g.vcf.gz.tbi: Tabix index<tumor_vs_normal>.vcf.gz: Somatic variants<tumor_vs_normal>.vcf.gz.tbi: Tabix index
DeepSomatic calls somatic SNVs and indels using deep learning.
VEP
Output files
somatic_snv/vep_annot/<tumor_vs_normal>.vep.anno.vcf.gz: VEP annotated variants<tumor_vs_normal>.vep.anno.vcf.gz.tbi: Tabix index<tumor_vs_normal>.vep.anno.vcf.gz_summary.html: Annotation summary
VEP provides functional annotation for somatic variants.
Mutational Pattern
Output files
signature_analysis/mutationalpattern/<patient_id>/<tumor_vs_normal>.mutation_profile.pdf: Mutation profile plots<tumor_vs_normal>.mut_sigs_bootstrapped.tsv: Bootstrapped signatures<tumor_vs_normal>.mut_sigs.tsv: Mutational signatures<tumor_vs_normal>.reconstructed_sigs.tsv: Reconstructed signatures<tumor_vs_normal>.type_occurences.tsv: Mutation type occurrences
MutationalPatterns identifies mutational signatures from somatic variants.
Somatic SV Calling
Severus
Output files
somatic_sv/severus/<patient_id>/orig/<patient_id>_severus_somatic_SVs/severus_somatic.vcf: Somatic SVs<patient_id>_severus_all_SVs/severus_all.vcf: All SVs (somatic + germline)severus.log: Run logread_qual.txt: Read quality metricsbreakpoints_double.csv: Breakpoint details
somatic_sv/severus/<patient_id>/filtered/<patient_id>.severus_somatic.vcf.gz: Filtered somatic SVs (if svpack enabled)<patient_id>.severus_somatic.vcf.gz.tbi: Tabix index
Severus identifies somatic structural variants from tumor-normal pairs.
SV_Pack
Output files
somatic_sv/svpack/<patient_id>/SVPACK_FILTER.out.vcf: Filtered SVsSVPACK_MATCH.out.vcf: Matched against control panelSVPACK_CONSEQUENCE.out.vcf: Functional consequencesSVPACK_TAGZYGOSITY.out.vcf: Zygosity annotations
svpack filters and annotates structural variants.
AnnotSV
Output files
somatic_sv/annotsv_annot/<patient_id>/<patient_id>.tsv: Comprehensive SV annotations
AnnotSV provides detailed annotations for structural variants.
Homologous Recombination Deficiency Estimation
CHORD
Output files
signature_analysis/chord/<patient_id>/<tumor_vs_normal>.chord.mutation_contexts.tsv: Mutation contexts<tumor_vs_normal>.chord.prediction.tsv: HRD predictions
CHORD estimates homologous recombination deficiency from mutational signatures.
Tumor Purity and Ploidy Estimation
AMBER
Output files
tumor_clonality/<patient_id>/amber/amber.version: Tool version<tumor_id>.amber.homozygousregion.tsv: Homozygous regions<tumor_id>.amber.baf.pcf: BAF piecewise constant fit<tumor_id>.amber.baf.tsv.gz: B-allele frequencies<tumor_id>.amber.contamination.tsv: Contamination estimates<tumor_id>.amber.contamination.vcf.gz: Contamination VCF<tumor_id>.amber.contamination.vcf.gz.tbi: Index file<tumor_id>.amber.qc: Quality control metrics
AMBER calculates B-allele frequencies for tumor purity estimation.
COBALT
Output files
tumor_clonality/<patient_id>/cobalt/cobalt.version: Tool version<tumor_id>.cobalt.gc.median.tsv: GC-corrected median ratios<tumor_id>.cobalt.ratio.median.tsv: Read depth ratios<tumor_id>.cobalt.ratio.pcf: Piecewise constant fit<tumor_id>.cobalt.chr.len.tsv: Chromosome lengths
COBALT calculates read depth ratios between tumor and normal.
PURPLE
Output files
tumor_clonality/<patient_id>/purple/purple.version: Tool version<tumor_id>.purple.cnv.gene.tsv: Gene-level CNV calls<tumor_id>.purple.cnv.somatic.tsv: Somatic CNV segments<tumor_id>.purple.driver.catalog.germline.tsv: Germline drivers<tumor_id>.purple.driver.catalog.somatic.tsv: Somatic drivers<tumor_id>.purple.germline.deletion.tsv: Germline deletions<tumor_id>.purple.purity.range.tsv: Purity solution range<tumor_id>.purple.purity.tsv: Final purity and ploidy estimates<tumor_id>.purple.qc: Quality control metrics<tumor_id>.purple.segment.tsv: Copy number segments<tumor_id>.purple.somatic.clonality.tsv: Variant clonalityplot/<tumor_id>.purity.range.png: Purity range plotplot/<tumor_id>.segment.png: Copy number plotplot/<tumor_id>.somatic_data.tsv: Plot data
PURPLE estimates tumor purity and ploidy by combining AMBER and COBALT results.
MultiQC
Output files
multiqc/multiqc_report.html: a standalone HTML file that can be viewed in your web browser.multiqc_data/: directory containing parsed statistics from the different tools used in the pipeline.multiqc_plots/: directory containing static images from the report in various formats.
MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.
Pipeline information
Output files
pipeline_info/- Reports generated by Nextflow:
execution_report.html,execution_timeline.html,execution_trace.txtandpipeline_dag.dot/pipeline_dag.svg. - Reports generated by the pipeline:
pipeline_report.html,pipeline_report.txtandsoftware_versions.yml. Thepipeline_report*files will only be present if the--email/--email_on_failparameter’s are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv. - Parameters used by the pipeline run:
params.json.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.