Introduction

This documentation describes the output of nf-core/circrna for the test dataset which runs all 3 modules in the workflow: circRNA discovery , miRNA prediction and differential expression analysis of circular RNAs in RNA-Seq data.

A full run of the workflow will produce the following directory output structure:

|-- results/
        |-- circrna_discovery
        |-- differential_expression
        |-- mirna_prediction
        |-- multiqc
        |-- pipeline_info
        |-- quality_control
        |-- references

Pipeline Overview

The pipeline is built using Nextflow and processes data using the following steps:

Quality Control

FastQC

Output files
  • fastqc/
    • *_fastqc.html: FastQC report containing quality metrics.
    • *_fastqc.zip: Zip archive containing the FastQC report, tab-delimited data file and plot images.

NB: The FastQC plots in this directory are generated relative to the raw, input reads. They may contain adapter sequence and regions of low quality. To see how your reads look after adapter and quality trimming please refer to the FastQC reports in the trimgalore/fastqc/ directory.

FastQC gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the FastQC help pages.

MultiQC - FastQC sequence counts plot

MultiQC - FastQC mean quality scores plot

MultiQC - FastQC adapter content plot

TrimGalore

Output files
  • trimgalore/
    • *.fq.gz: If --save_trimmed is specified, FastQ files after adapter trimming will be placed in this directory.
    • *_trimming_report.txt: Log file generated by Trim Galore!.
  • trimgalore/fastqc/
    • *_fastqc.html: FastQC report containing quality metrics for read 1 (and read2 if paired-end) after adapter trimming.
    • *_fastqc.zip: Zip archive containing the FastQC report, tab-delimited data file and plot images.

Trim Galore! is a wrapper tool around Cutadapt and FastQC to peform quality and adapter trimming on FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence.

NB: TrimGalore! will only run using multiple cores if you are able to use more than > 5 and > 6 CPUs for single- and paired-end data, respectively. The total cores available to TrimGalore! will also be capped at 4 (7 and 8 CPUs in total for single- and paired-end data, respectively) because there is no longer a run-time benefit. See release notes and discussion whilst adding this logic to the nf-core/atacseq pipeline.

MultiQC - cutadapt trimmed sequence length plot

DESeq2

Output files
  • quality_control/DESeq2_QC
    • circRNA/
      • DESeq2_condition_PCA.pdf: PCA plot of PC1 vs. PC2 displaying the highest amount of variation within the response variable condition.

      circRNA PCA

      - `DESeq2_dispersion.pdf`: Plot of re-fitted genes + gene outliers after shrinkage estimation performed by gene-wide maximum likelihood estimates (red curve) & maximum a posteriori estimates of dispersion.

      circRNA dispersion

      - `DESeq2_sample_dendogram.pdf`: Dendogram displaying sample distances using [pvclust](https://cran.r-project.org/web/packages/pvclust/index.html).

      circRNA dendo

      - `DESeq2_sample_heatmap.pdf`: Heatmap displaying Manhattan distance between samples.

      circRNA samplehm

    • RNA-Seq/
      • DESeq2_condition_PCA.pdf: PCA plot of PC1 vs. PC2 displaying the highest amount of variation within the response variable condition.

      circRNA PCA

      - `DESeq2_dispersion.pdf`: Plot of re-fitted genes + gene outliers after shrinkage estimation performed by gene-wide maximum likelihood estimates (red curve) & maximum a posteriori estimates of dispersion.

      circRNA dispersion

      - `DESeq2_sample_dendogram.pdf`: Dendogram displaying sample distances using [pvclust](https://cran.r-project.org/web/packages/pvclust/index.html).

      circRNA dendo

      - `DESeq2_sample_heatmap.pdf`: Heatmap displaying Manhattan distance between samples.

      circRNA samplehm

nf-core/circrna outputs quality control plots of normalised log2 expression data from DESeq2 to assess heterogeneity in the experiment samples. These plots can be useful to assess sample-sample similarity and to identify potential batch effects within the experiment. Plots are generated for both circRNAs and RNA-Seq data when the differential expression analysis module has been selected by the user (see --module documentation).

Note

The FastQC plots displayed in the MultiQC report show untrimmed reads. They may contain adapter sequence and potentially regions with low quality.

MultiQC

Output files
  • quality_control/MultiQC/
    • Raw_Reads_MultiQC.html: Summary reports of unprocessed RNA-Seq reads.
    • Trimmed_Reads_MultiQC.html: Summary reports of processed RNA-Seq reads.

MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. nf-core outputs HTML reports for sequencing read quality control.

Genome Index Files

Output files
  • reference_genome
    • BowtieIndex/: Directory containing Bowtie indices.
    • Bowtie2Index/: Directory containing Bowtie2 indices.
    • BWAIndex/: Directory containing BWA indices.
    • Hisat2Index/: Directory containing HISAT2 indices.
    • SAMtoolsIndex: Directory containing SAMtools index file.
    • STARIndex: Directory containing STAR indices.
    • SegemehlIndex: Directory containing Segemehl index file.
  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter’s are used when running the pipeline.
    • Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
    • Parameters used by the pipeline run: params.json.

nf-core/circrna will save genome indices when --save_reference true. This is highly encouraged to reduce runtimes on redeployment of the workflow, as you can supply them to aligner in question via the aligner flag (for example --star '/path/to/STAR/'. Available: bowtie, bowtie2, bwa, hisat2, star, segemehl). Make sure to move the saved genome indicies to a different location before doing so.

circRNA Quantification

Common Outputs

The workflow is designed to output three files per sample, per quantification tool in the circrna_discovery directory. Using the test dataset as an example, the directory structure for the fust_1 sample is:

|-- results/
      |-- circrna_discovery/
            |-- circexplorer2/
                  | -- fust_1/
                          | -- fust_1.bed
                          | -- fust_1.fasta
                          | -- fust_1.log
 
 
- `${sample_id}.bed`: A customised BED 12 file containing filtered, annotated circRNAs. Columns: chromosome, start, end, name, read counts, strand, start, end, RGB, number of exon blocks, size of exon blocks, start positions (within sequence) of exon blocks, parent gene ID(s), parent transcript ID(s), mature spliced length.
- `${sample_id}.log`: Script detailing the decisions made by `annotate_outputs.sh` when annotating each circRNA.
- `${sample_id}.fasta`: Mature spliced sequence of circRNA in FASTA format. Includes splice junction site (+-20bp).
 
Sample outputs for the corresponding `.log`, `.bed` and `.fasta` entry are given below for a circRNA called by `CIRCexplorer2`:
 
```console
[nf-core/circrna]: Starting analysis for: chrI:1140805-1147588:-
[nf-core/circrna]: chrI:1140805-1147588:- overlaps features in GTF file
[nf-core/circrna]: Inspecting Genes...
[nf-core/circrna]: Overlapping Gene IDs: Y48G8AL.10
[nf-core/circrna]: Converting to BED12
[nf-core/circrna]: Attempting to fit circRNA to gene exon boundaries
[nf-core/circrna]: chrI:1140805-1147588:- fits gene exons, is a circRNA
[nf-core/circrna]: cleaning up intermediate files
chrI	1140805	1147588	chrI:1140805-1147588:-	2	-	1140805	1147588	0	5	229,214,191,141,499	0,1282,2546,4912,6284	circRNA	Y48G8AL.10	NM_001306296,NM_001306297,NM_001306298,NM_001306299	1274
>chrI:1087252-1088602:-
CATGAAGTCTCGAGATCTCGTTTATAAGCACCAATATCCACGTTCAGCATTATTGATTGataaaattaatttataaattcgaaaataaaatttaaatttttCTTTAGAAATTATCGATTTATCGACTTCCACGTAATTCCACACCACGCTAAAATTCCATATCAATCTCGCGTTGTTTGGCTTCTCGTTGGGTGTCCGCCGCGTGGGAGTAGTATCTGCAAAAAAAAATTTGAGAATAAAAAATGTAAAATTGtttttcctattttctattgccgaaatttgagatttccggcaaatcggcaaattgccggaattgaaatttgcggcaaatcggcaaactgccgcaattgaaatttcgggtaaatcggcaaatttccggcaaatcggcatattgccggaatttaaatttccggcaaggcggccaatcggaaaattggcaaattgccgcaattgaaatttgcggcaaatcggcaattgtcgactattttcgacaacttctcgctttgcacttttttgtacatttcagattttttttcaatttcaatcggcaaaaacatttccggcaaatcggtaaattgccagaattgaaatttccggcaaatcggcaaattgccggaattgaaatttcccgcaaatcggcaaatttctttaattgaaatttccggcaaatcggtaaattgccggaatttaaatttccggcaactcggcaaactgccccaattgaaatttccggtaaatcggtaaaatgccgaaatttaaatttccggcaaggtggcaaatcggaaaattggcaaattgccggaattcaaatatccggcaaatcggcaagttgctggaattgaaatttccggcaaggcggcaaatttccggcaaatcggcaattGTCTTATattttcgacaacttctcgttttgcacttttttttgtacatttcaggttttttttcaatttcaatcggcaaaaacatttccggcaaatctgatatccggcaaacggcaaatcggcaatttgccgaaaataaaaaattcaagcaactcggcaaaccggcaaattTTATAGAGCACATTTGACCCACCTATTGAGAATAAACAATTGCGAGATAAAAATCTTGATGTAAATTCCGGCGAATGCGATCAAAATTGCTTTTCGATCTGAAAAAAATCCAATTTTGCTCAGCCAATAAATGGACGGAGCTAAAAACAAGGCGCTACTCACGAGAAATCCACTCATACGGGTCTTCTGTCACATTTTCCTGCTCGGATTTCGATTTTGGCGTATCTTCGGTCGGATTTCCGTGGTAATCGGACAACCAGGCAATCACTACAATTATTGCGCAAATGAATCGGGCAAC

The workflow’s manual annotation process is designed to mimick annotation performed by CIRCexplorer2 to standardise the annotation process for all circRNA quantification tools.

Intermediate files generated by each quantification tool are described in depth below.

CIRCexplorer2

Output files
  • circrna_discovery/circexplorer2/intermediates/${sample_id}/

    • *.bed: Intermediate file generated by CIRCexplorer2 parse module, identifying STAR fusion junctions for downstream annotation.
    • *_circexplorer2_circs.bed: Filtered BED6 file containing circRNA counts used for count matrix generation.
    • *.txt: Output files generated by CIRCexplorer2 annotate module, based on BED 12 format containing circRNA genomic location information, exon cassette composition and an additional 6 columns specifying circRNA annotations. Full descriptions of the 18 columns can be found in the CIRCexplorer2 documentation.
  • circrna_discovery/star

    • 1st_pass
      • *.Aligned.out.bam: Coordinate sorted bam file containing aligned reads and chimeric reads.
      • *.Chimeric.out.junction: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found in STAR documentation (section 5.4).
      • *.Log.final.out: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.
      • *.Log.out: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.
      • *.Log.progress.out: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.
      • *.SJ.out.tab: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found in STAR documentation (section 4.4).
    • 2nd_pass
      • *.Aligned.out.bam: Coordinate sorted bam file containing aligned reads and chimeric reads.
      • *.Chimeric.out.junction: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found in STAR documentation (section 5.4).
      • *.Log.final.out: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.
      • *.Log.out: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.
      • *.Log.progress.out: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.
      • *.SJ.out.tab: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found in STAR documentation (section 4.4).
    • sjdb
      • dataset.SJ.out.tab: Chromosome, start, end & strand coordinates of novel splice junctions for all samples aligned using STAR 1st pass.

CIRCexplorer2 uses *.Chimeric.out.junction files generated from STAR 2 pass mode to extract back-splice junction sites using the CIRCexplorer2 parse module. Following this, CIRCexplorer2 annotate performs re-alignment of reads to the back-splice junction sites to determine the precise positions of downstream donor and upstream acceptor splice sites. Back-splice junction sites are subsequently updated and annotated using the customised annotation text file.

circRNA finder

Output files
  • circrna_discovery/circrna_finder/intermediates/${sample_id}/

    • *.filteredJunctions.bed: A bed file with all circular junctions found by the pipeline. The score column indicates the number reads spanning each junction.
    • *.s_filteredJunctions.bed: A bed file with those junctions in *.filteredJunctions.bed that are flanked by GT-AG splice sites. The score column indicates the number reads spanning each junction.
    • *.s_filteredJunctions_fw.bed: A bed file with the same circular junctions as in file (b), but here the score column gives the average number of forward spliced reads at both splice sites around each circular junction.
  • circrna_discovery/star

    • 1st_pass
      • *.Aligned.out.bam: Coordinate sorted bam file containing aligned reads and chimeric reads.
      • *.Chimeric.out.junction: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found in STAR documentation (section 5.4).
      • *.Log.final.out: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.
      • *.Log.out: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.
      • *.Log.progress.out: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.
      • *.SJ.out.tab: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found in STAR documentation (section 4.4).
    • 2nd_pass
      • *.Aligned.out.bam: Coordinate sorted bam file containing aligned reads and chimeric reads.
      • *.Chimeric.out.junction: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found in STAR documentation (section 5.4).
      • *.Chimeric.out.sam: Chimeric alignments in SAM format.
      • *.Log.final.out: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.
      • *.Log.out: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.
      • *.Log.progress.out: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.
      • *.SJ.out.tab: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found in STAR documentation (section 4.4).
    • sjdb
      • dataset.SJ.out.tab: Chromosome, start, end & strand coordinates of novel splice junctions for all samples aligned using STAR 1st pass.

circRNA finder uses *.Chimeric.out.sam, *.Chimeric.out.junction & *.SJ.out.tab from STAR 2nd pass files to identify circular RNAs in RNA-Seq data.

CIRIquant

Output files
  • circrna_discovery/ciriquant/intermediates/${sample_id}/
    • *.log: A CIRIerror.log file which should be empty, and a ${sample_id}.log file which contains the output log of CIRIquant.
    • *.bed: CIRI2 output file in BED 6 format.
    • *.gtf: Output file from CIRIquant in GTF format. Full description of the columns available in the CIRIquant documentation.
    • align/
      • *.sorted.{bam, bam.bai}: (Sorted and indexed) bam file from HISAT2 alignment of RNA-Seq reads.
    • circ/
      • *.ciri: CIRI2 output file.
      • *_denovo.sorted.{bam, bam.bai}: (Sorted and indexed) bam file from BWA alignment of candidate circular reads to the pseudo reference.
      • *_index.*.ht2: BWA index files of the pseudo reference.
      • *_index.fa: Reference FASTA file of candidate circular reads.

CIRIquant operates by aligning RNA-Seq reads using HISAT2 and CIRI2 to identify putative circRNAs. Next, a pseudo reference index is generated using bwa index by concatenating the two full-length sequences of the putative back-splice junction regions. Candidate circular reads are re-aligned against this pseudo reference using bwa mem, and back-splice junction reads are determined if they can be linearly and completely aligned to the putative back-splice junction regions.

DCC

Output files
  • /circrna_discovery/DCC/intermediates/${sample_id}/

    • *CircCoordinates: Circular RNA annotations in BED format. Full description of the columns are available in the DCC documentation.
    • *CircRNACount: A table containing read counts for circRNAs detected.
    • mate1/: Output directory of STAR 2nd pass alignment for R1.
    • mate2/: Output directory of STAR 2nd pass alignment for R2.

DCC identifies back-splice junction sites from *Chimeric.out.junction, *SJ.out.tab & *Aligned.sortedByCoord.out.bam files generated by STAR 2 pass mode, mapping the paired end reads both jointly and separately (STAR does not output read pairs that contain more than one chimeric junction thus a more granular approach is taken by DCC to fully characterise back-splice junctions in reads).

DCC then performs a series of filtering steps on candidate circular reads:

  1. Mapping of mates must be consistent with a circular RNA template i.e align to the back-splice junction.
  2. Filtering by a minimum number of junction reads per replicate (nf-core/circrna has set this parameter to-Nr 1 1 allowing all reads).
  3. Circular reads are not allowed span more than one gene.
  4. Circular reads aligning to mitochondrial genome are removed.
  5. Circular reads that lack a canonical (GT/AG) splicing signal at the circRNA junction borders are removed.

Find circ

Output files
  • circrna_discovery/find_circ/intermediates/${sample_id}/
    • *_anchors.qfa.gz: 20mer anchors extracted from unmapped reads.
    • *_unmapped.bam: Unmapped RNA-Seq reads to reference genome.
    • *.sites.bed: Output from find_circ, first six columns are in standard BED format. A description of the remaining columns is available in the find_circ documentation.
    • *.sites.log: Summary statistics of candidate circular reads in the sample.
    • *.sites.reads: Tab delimited file containing circRNA ID & sequence.

find circ utilises Bowtie2 short read mapper to align RNA-Seq reads to the genome. Reads that align fully and contiguously are discarded. Unmapped reads are converted to 20mers and aligned independently to find unique anchor positions within spliced exons - anchors that align in reverse orientation indicate circular RNA junctions. Anchor alignments are extended and must meet the following criteria:

  1. Breakpoints flanked by GT/AG splice sites.
  2. Unambiguous breakpoint detection.
  3. Maximum 2 mismatches in extension procedure.
  4. Breakpoint cannot reside more than 2nt inside a 20mer anchor.
  5. 2 reads must support the junction.

MapSplice

Output files
  • circrna_discovery/mapsplice/intermediates/${sample_id}/
    • alignments.bam: Bam file containing aligned reads and fusion alignments.
    • deletions.txt: Report of deletions.
    • Fusion output files:
      • fusions_raw.txt: raw fusion junctions without filtering
      • fusion_candidates.txt: filtered fusion junctions
      • fusions_well_annotated.txt: annotated fusion junction candidates (align to annotation file provided)
      • fusions_not_well_annotated.txt: fusions that do not align with supplied annotations
    • circular_RNAs.txt: circular RNAs reported.
    • insertions.txt: Report of Insertions.
    • junctions.txt: Reported splice junctions.
    • stats.txt: Read alignment, Junction statistics.

MapSplice first splits reads into segments, and maps them to reference genome by using Bowtie. MapSplice attempts to fix unmapped segments as gapped alignments, with each gap corresponding to a splice junction. Finally a remapping step is used to identify back-spliced alignments that are in the presence of small exons.

Segemehl

Output files
  • circrna_discovery/segemehl/intermediates/${sample_id}/
    • *.bam: Aligned reads in BAM format
    • *.mult.bed: Thus, this bed file contains all splice events of a read. The start and end positions indicate the nucleotide after the first split (i.e. the beginning of the first intron) and the nucleotide before the last split (i.e. the end of the last intron), respectively. The name and score are equivalent to the one in the *.sngl file described above. The following fields 7 & 8 (thickStart and thickEnd) should be the identical to fields 2 & 3. Field 9 holds the color information for the item in RGB encoding (itemRGB). Field 10 (blockCount) indicates the number of splits represented by the BED item. Field 11 is a comma separated list of the intron sizes (blockSizes). Field 12 is the comma separated list of intron starts (blockStarts).
    • *.sngl.bed: The bed file contains all single splice events predicted in the split read alignments.
    • *.trns.bed: The custom text file contains all single split alignments predicted to be in trans, i.e. split alignments that are located on different chromosomes and/or different strands.

Segemehl implements split read alignment mode for reads that failed the attempt of collinear alignment. The algorithm will consider circular alignments. Circular splits are output to ${sample_id}.sngl.bed and parsed using customised scripts to produce counts representative of Segemehl quantification.

Count Matrix

Output files
  • circrna_discovery/
    • count_matrix.txt: Raw circRNA read counts for all samples in matrix format.

nf-core/circrna produces a counts matrix of circRNA read counts for each sample. circRNAs with BSJ reads < --bsj_reads <int> have been removed during the quantification step, with a further filtering step included depending on the number of quantification tools selected. If the user has selected more than one circRNA quantification tool, nf-core/circrna will demand that a circRNA be called by at least two quantification tools or else it is removed. This approach is recommended to reduce the number of false positives.

miRNA Prediction

miRanda

Output files
  • mirna_prediction/miRanda/${sample_id}/
    • *.miRanda.txt: Raw outputs from miRanda.

miRanda performs miRNA target prediction of a genomic sequence against a miRNA database in 2 phases:

  1. First a dynamic programming local alignment is carried out between the query miRNA sequence and the reference sequence. This alignment procedure scores based on sequence complementarity and not on sequence identity.
  2. Secondly, the algorithm takes high-scoring alignments detected from phase 1 and estimates the thermodynamic stability of RNA duplexes based on these alignments. This second phase of the method utilises folding routines from the RNAlib library, part of the ViennaRNA package.

TargetScan

Output files
  • mirna_prediction/TargetScan/${sample_id}/
    • *.targetscan.txt: Raw outputs from TargetScan.

TargetScan predicts biological targets of miRNAs by searching for the presence of conserved 8mer, 7mer, and 6mer sites within the circRNA mature sequence that match the seed region of each miRNA.

miRNA targets

Output files
  • mirna_prediction/${sample_id}/
    • *_miRNA_targets.txt: Filtered target miRNAs of circRNAs called by quantification tools. Columns are self explanatory: miRNA, Score, Energy_KcalMol, Start, End, Site_type.

nf-core/circrna performs miRNA target filtering on miRanda and TargetScan predictions:

  1. miRNA must be called by both miRanda and TargetScan.
  2. If a site within the circRNA mature sequence shares duplicate miRNA ID’s overlapping the same coordinates, the miRNA with the highest score is kept.

Differential Expression Analysis

nf-core/circrna will perform differential expression analysis by contrasting every variable within the condition column i.e the response variable.

samplescondition
control_rep1control
control_rep2control
control_rep3control
lung_rep1lung
lung_rep2lung
lung_rep3lung
melanoma_rep1melanoma
melanoma_rep2melanoma
melanoma_rep3melanoma

The above experimental design will produce the DESeq2 design formula ~ condition and loop through the nested factors within condition producing outputs for control_vs_lung, control_vs_melanoma, lung_vs_control, lung_vs_melanoma, melanoma_vs_control and melanoma_vs_lung, capturing every possible contrast.

N.B: In the phenotype file the response variable must be called condition, these values are hard-coded in the automated differential expression analysis R script.

circRNA

Output files
  • differential_expression/circRNA/
    • DESeq2_log2_transformed_counts.txt: log2(Normalised counts + 1)
    • DESeq2_normalized_counts.txt: Normalised circRNA counts.
    • control_vs_lung/
      • DESeq2_{control_vs_lung}_Adj_pvalue_distribution.pdf: Histogram of Adj pvalues from results(dds) displaying the distribution of circRNAs that reject the null hypothesis (padj <= 0.05).

      circRNA adj-p histogram

      - `DESeq2_{control_vs_lung}_down_regulated_differential_expression.txt`: DESeq2 `results()` output filtered to include down regulated circRNAs (fold change <= -1, pvalue <= 0.05) in `condition` with respect to `control`. - `DESeq2_{control_vs_lung}_fold_change_distribution.pdf`: Histogram of fold-change values for differentially expressed circRNAs.

      circRNA FC histogram

      - `DESeq2_{control_vs_lung}_heatmap.pdf`: Heatmap of all differentially expressed circRNAs.

      circRNA heatmap

      - `DESeq2_{control_vs_lung}_MA_plot.pdf`: Plot of the relationship between intensity and difference between the contrast made by `DESeq2`.

      circRNA heatmap

      - `DESeq2_{control_vs_lung}_pvalue_distribution.pdf`: Histogram of pvalues from `results(dds)` displaying the distribution of circRNAs that reject the null hypothesis (pvalue <= 0.05).

      circRNA pval dist

      - `DESeq2_{condition_vs_lung}_up_regulated_differential_expression.txt`: DEseq2 `results()` ouput filtered to include up regulated circRNAs (fold change >= 1, pvalue <= 0.05) in `condition` with respect to `control`. - `DESeq2_{condition_vs_lung}_volcano_plot.pdf`: Volcano plot of differentially expressed circRNAs from DESeq2 `results()` using [EnhancedVolcano](https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html).

      circRNA volcano

Sample outputs from control_vs_lung are given below, one of 6 DESeq2 results folders returned by the experimental design given above.

Note: The test dataset produces sparsely populated plots due to aggressive subsampling.

Boxplots

Output files
  • differential_expression/boxplots/
    • control_vs_lung
      • *boxplot.pdf: Boxplot of differentially expressed circRNAs in control_vs_lung.

      circRNA boxplot

    • control_vs_lung
      • *boxplot.pdf: Boxplot of differentially expressed circRNAs in control_vs_melanoma.

      circRNA boxplot

nf-core/circrna will produce boxplots of differentially expressed circRNAs (normalised expression) between all contrasts available in condition.

Note: The output files give examples for control_vs_lung and control_vs_melanoma.

RNA-Seq

Output files
  • differential_expression/RNA-Seq/
    • DESeq2_log2_transformed_counts.txt: log2(Normalised counts + 1)
    • DESeq2_normalized_counts.txt: Normalised RNA-Seq counts.
    • control_vs_lung/
      • DESeq2_{control_vs_lung}_Adj_pvalue_distribution.pdf: Histogram of Adj pvalues from results(dds) displaying the distribution of genes that reject the null hypothesis (padj <= 0.05).

      circRNA adj-p histogram

      - `DESeq2_{control_vs_lung}_down_regulated_differential_expression.txt`: DESeq2 `results()` output filtered to include down regulated genes (fold change <= -1, pvalue <= 0.05) in `condition` with respect to `control`. - `DESeq2_{control_vs_lung}_fold_change_distribution.pdf`: Histogram of fold-change values for differentially expressed genes.

      circRNA FC histogram

      - `DESeq2_{control_vs_lung}_heatmap.pdf`: Heatmap of all differentially expressed genes.

      circRNA heatmap

      - `DESeq2_{control_vs_lung}_MA_plot.pdf`: Plot of the relationship between intensity and difference between the contrast made by `DESeq2`.

      circRNA heatmap

      - `DESeq2_{control_vs_lung}_pvalue_distribution.pdf`: Histogram of pvalues from `results(dds)` displaying the distribution of genes that reject the null hypothesis (pvalue <= 0.05).

      circRNA pval dist

      - `DESeq2_{condition_vs_lung}_up_regulated_differential_expression.txt`: DEseq2 `results()` ouput filtered to include up regulated genes (fold change >= 1, pvalue <= 0.05) in `condition` with respect to `control`. - `DESeq2_{condition_vs_lung}_volcano_plot.pdf`: Volcano plot of differentially expressed genes from DESeq2 `results()` using [EnhancedVolcano](https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html).

      circRNA volcano

Sample outputs from control_vs_lung are given below, one of 6 DESeq2 results folders returned by the experimental design given above.

Note: The test dataset produces sparsely populated plots due to aggressive subsampling.