Introduction
This documentation describes the output of nf-core/circrna
for the test dataset which runs all 3 modules in the workflow: circRNA discovery
, miRNA prediction
and differential expression
analysis of circular RNAs in RNA-Seq data.
A full run of the workflow will produce the following directory output structure:
|-- results/
|-- circrna_discovery
|-- differential_expression
|-- mirna_prediction
|-- multiqc
|-- pipeline_info
|-- quality_control
|-- references
|-- results/
|-- circrna_discovery
|-- differential_expression
|-- mirna_prediction
|-- multiqc
|-- pipeline_info
|-- quality_control
|-- references
Pipeline Overview
The pipeline is built using Nextflow and processes data using the following steps:
- Raw read QC (
FastQC
) - Adapter trimming (
Trim Galore!
) - MultiQC report
MultiQC
- circRNA quantification
CIRIquant
STAR 2-Pass mode
find circ
MapSplice
Segemehl
- circRNA annotation
- Export mature spliced length as FASTA file
- Annotate parent gene, underlying transcripts.
- circRNA count matrix
- miRNA target prediction
miRanda
TargetScan
- Filter results, miRNAs must be called by both tools
- Differential expression analysis
DESeq2
- Circular - Linear ratio tests ‘CircTest’
Quality Control
FastQC
Output files
fastqc/
*_fastqc.html
: FastQC report containing quality metrics.*_fastqc.zip
: Zip archive containing the FastQC report, tab-delimited data file and plot images.
NB: The FastQC plots in this directory are generated relative to the raw, input reads. They may contain adapter sequence and regions of low quality. To see how your reads look after adapter and quality trimming please refer to the FastQC reports in the
trimgalore/fastqc/
directory.
FastQC gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the FastQC help pages.
TrimGalore
Output files
trimgalore/
*.fq.gz
: If--save_trimmed
is specified, FastQ files after adapter trimming will be placed in this directory.*_trimming_report.txt
: Log file generated by Trim Galore!.
trimgalore/fastqc/
*_fastqc.html
: FastQC report containing quality metrics for read 1 (and read2 if paired-end) after adapter trimming.*_fastqc.zip
: Zip archive containing the FastQC report, tab-delimited data file and plot images.
Trim Galore! is a wrapper tool around Cutadapt and FastQC to peform quality and adapter trimming on FastQ files. By default, Trim Galore! will automatically detect and trim the appropriate adapter sequence.
NB: TrimGalore! will only run using multiple cores if you are able to use more than > 5 and > 6 CPUs for single- and paired-end data, respectively. The total cores available to TrimGalore! will also be capped at 4 (7 and 8 CPUs in total for single- and paired-end data, respectively) because there is no longer a run-time benefit. See release notes and discussion whilst adding this logic to the nf-core/atacseq pipeline.
DESeq2
Output files
quality_control/DESeq2_QC
circRNA/
DESeq2_condition_PCA.pdf
: PCA plot of PC1 vs. PC2 displaying the highest amount of variation within the response variablecondition
.
RNA-Seq/
DESeq2_condition_PCA.pdf
: PCA plot of PC1 vs. PC2 displaying the highest amount of variation within the response variablecondition
.
nf-core/circrna
outputs quality control plots of normalised log2 expression data from DESeq2
to assess heterogeneity in the experiment samples. These plots can be useful to assess sample-sample similarity and to identify potential batch effects within the experiment. Plots are generated for both circRNAs and RNA-Seq data when the differential expression analysis module has been selected by the user (see --module
documentation).
MultiQC
Output files
quality_control/MultiQC/
Raw_Reads_MultiQC.html
: Summary reports of unprocessed RNA-Seq reads.Trimmed_Reads_MultiQC.html
: Summary reports of processed RNA-Seq reads.
MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. nf-core
outputs HTML reports for sequencing read quality control.
Genome Index Files
Output files
reference_genome
BowtieIndex/
: Directory containingBowtie
indices.Bowtie2Index/
: Directory containingBowtie2
indices.BWAIndex/
: Directory containingBWA
indices.Hisat2Index/
: Directory containingHISAT2
indices.SAMtoolsIndex
: Directory containingSAMtools
index file.STARIndex
: Directory containingSTAR
indices.SegemehlIndex
: Directory containingSegemehl
index file.
nf-core/circrna
will save genome indices when --save_reference true
. This is highly encouraged to reduce runtimes on redeployment of the workflow, as you can supply them to aligner in question via the aligner flag (for example --star '/path/to/STAR/'
. Available: bowtie
, bowtie2
, bwa
, hisat2
, star
, segemehl
). Make sure to move the saved genome indicies to a different location before doing so.
circRNA Quantification
Common Outputs
The workflow is designed to output three files per sample, per quantification tool in the circrna_discovery
directory. Using the test dataset as an example, the directory structure for the fust_1
sample is:
|-- results/
|-- circrna_discovery/
|-- circexplorer2/
| -- fust_1/
| -- fust_1.bed
| -- fust_1.fasta
| -- fust_1.log
- `${sample_id}.bed`: A customised BED 12 file containing filtered, annotated circRNAs. Columns: chromosome, start, end, name, read counts, strand, start, end, RGB, number of exon blocks, size of exon blocks, start positions (within sequence) of exon blocks, parent gene ID(s), parent transcript ID(s), mature spliced length.
- `${sample_id}.log`: Script detailing the decisions made by `annotate_outputs.sh` when annotating each circRNA.
- `${sample_id}.fasta`: Mature spliced sequence of circRNA in FASTA format. Includes splice junction site (+-20bp).
Sample outputs for the corresponding `.log`, `.bed` and `.fasta` entry are given below for a circRNA called by `CIRCexplorer2`:
```console
[nf-core/circrna]: Starting analysis for: chrI:1140805-1147588:-
[nf-core/circrna]: chrI:1140805-1147588:- overlaps features in GTF file
[nf-core/circrna]: Inspecting Genes...
[nf-core/circrna]: Overlapping Gene IDs: Y48G8AL.10
[nf-core/circrna]: Converting to BED12
[nf-core/circrna]: Attempting to fit circRNA to gene exon boundaries
[nf-core/circrna]: chrI:1140805-1147588:- fits gene exons, is a circRNA
[nf-core/circrna]: cleaning up intermediate files
|-- results/
|-- circrna_discovery/
|-- circexplorer2/
| -- fust_1/
| -- fust_1.bed
| -- fust_1.fasta
| -- fust_1.log
- `${sample_id}.bed`: A customised BED 12 file containing filtered, annotated circRNAs. Columns: chromosome, start, end, name, read counts, strand, start, end, RGB, number of exon blocks, size of exon blocks, start positions (within sequence) of exon blocks, parent gene ID(s), parent transcript ID(s), mature spliced length.
- `${sample_id}.log`: Script detailing the decisions made by `annotate_outputs.sh` when annotating each circRNA.
- `${sample_id}.fasta`: Mature spliced sequence of circRNA in FASTA format. Includes splice junction site (+-20bp).
Sample outputs for the corresponding `.log`, `.bed` and `.fasta` entry are given below for a circRNA called by `CIRCexplorer2`:
```console
[nf-core/circrna]: Starting analysis for: chrI:1140805-1147588:-
[nf-core/circrna]: chrI:1140805-1147588:- overlaps features in GTF file
[nf-core/circrna]: Inspecting Genes...
[nf-core/circrna]: Overlapping Gene IDs: Y48G8AL.10
[nf-core/circrna]: Converting to BED12
[nf-core/circrna]: Attempting to fit circRNA to gene exon boundaries
[nf-core/circrna]: chrI:1140805-1147588:- fits gene exons, is a circRNA
[nf-core/circrna]: cleaning up intermediate files
chrI 1140805 1147588 chrI:1140805-1147588:- 2 - 1140805 1147588 0 5 229,214,191,141,499 0,1282,2546,4912,6284 circRNA Y48G8AL.10 NM_001306296,NM_001306297,NM_001306298,NM_001306299 1274
chrI 1140805 1147588 chrI:1140805-1147588:- 2 - 1140805 1147588 0 5 229,214,191,141,499 0,1282,2546,4912,6284 circRNA Y48G8AL.10 NM_001306296,NM_001306297,NM_001306298,NM_001306299 1274
>chrI:1087252-1088602:-
CATGAAGTCTCGAGATCTCGTTTATAAGCACCAATATCCACGTTCAGCATTATTGATTGataaaattaatttataaattcgaaaataaaatttaaatttttCTTTAGAAATTATCGATTTATCGACTTCCACGTAATTCCACACCACGCTAAAATTCCATATCAATCTCGCGTTGTTTGGCTTCTCGTTGGGTGTCCGCCGCGTGGGAGTAGTATCTGCAAAAAAAAATTTGAGAATAAAAAATGTAAAATTGtttttcctattttctattgccgaaatttgagatttccggcaaatcggcaaattgccggaattgaaatttgcggcaaatcggcaaactgccgcaattgaaatttcgggtaaatcggcaaatttccggcaaatcggcatattgccggaatttaaatttccggcaaggcggccaatcggaaaattggcaaattgccgcaattgaaatttgcggcaaatcggcaattgtcgactattttcgacaacttctcgctttgcacttttttgtacatttcagattttttttcaatttcaatcggcaaaaacatttccggcaaatcggtaaattgccagaattgaaatttccggcaaatcggcaaattgccggaattgaaatttcccgcaaatcggcaaatttctttaattgaaatttccggcaaatcggtaaattgccggaatttaaatttccggcaactcggcaaactgccccaattgaaatttccggtaaatcggtaaaatgccgaaatttaaatttccggcaaggtggcaaatcggaaaattggcaaattgccggaattcaaatatccggcaaatcggcaagttgctggaattgaaatttccggcaaggcggcaaatttccggcaaatcggcaattGTCTTATattttcgacaacttctcgttttgcacttttttttgtacatttcaggttttttttcaatttcaatcggcaaaaacatttccggcaaatctgatatccggcaaacggcaaatcggcaatttgccgaaaataaaaaattcaagcaactcggcaaaccggcaaattTTATAGAGCACATTTGACCCACCTATTGAGAATAAACAATTGCGAGATAAAAATCTTGATGTAAATTCCGGCGAATGCGATCAAAATTGCTTTTCGATCTGAAAAAAATCCAATTTTGCTCAGCCAATAAATGGACGGAGCTAAAAACAAGGCGCTACTCACGAGAAATCCACTCATACGGGTCTTCTGTCACATTTTCCTGCTCGGATTTCGATTTTGGCGTATCTTCGGTCGGATTTCCGTGGTAATCGGACAACCAGGCAATCACTACAATTATTGCGCAAATGAATCGGGCAAC
>chrI:1087252-1088602:-
CATGAAGTCTCGAGATCTCGTTTATAAGCACCAATATCCACGTTCAGCATTATTGATTGataaaattaatttataaattcgaaaataaaatttaaatttttCTTTAGAAATTATCGATTTATCGACTTCCACGTAATTCCACACCACGCTAAAATTCCATATCAATCTCGCGTTGTTTGGCTTCTCGTTGGGTGTCCGCCGCGTGGGAGTAGTATCTGCAAAAAAAAATTTGAGAATAAAAAATGTAAAATTGtttttcctattttctattgccgaaatttgagatttccggcaaatcggcaaattgccggaattgaaatttgcggcaaatcggcaaactgccgcaattgaaatttcgggtaaatcggcaaatttccggcaaatcggcatattgccggaatttaaatttccggcaaggcggccaatcggaaaattggcaaattgccgcaattgaaatttgcggcaaatcggcaattgtcgactattttcgacaacttctcgctttgcacttttttgtacatttcagattttttttcaatttcaatcggcaaaaacatttccggcaaatcggtaaattgccagaattgaaatttccggcaaatcggcaaattgccggaattgaaatttcccgcaaatcggcaaatttctttaattgaaatttccggcaaatcggtaaattgccggaatttaaatttccggcaactcggcaaactgccccaattgaaatttccggtaaatcggtaaaatgccgaaatttaaatttccggcaaggtggcaaatcggaaaattggcaaattgccggaattcaaatatccggcaaatcggcaagttgctggaattgaaatttccggcaaggcggcaaatttccggcaaatcggcaattGTCTTATattttcgacaacttctcgttttgcacttttttttgtacatttcaggttttttttcaatttcaatcggcaaaaacatttccggcaaatctgatatccggcaaacggcaaatcggcaatttgccgaaaataaaaaattcaagcaactcggcaaaccggcaaattTTATAGAGCACATTTGACCCACCTATTGAGAATAAACAATTGCGAGATAAAAATCTTGATGTAAATTCCGGCGAATGCGATCAAAATTGCTTTTCGATCTGAAAAAAATCCAATTTTGCTCAGCCAATAAATGGACGGAGCTAAAAACAAGGCGCTACTCACGAGAAATCCACTCATACGGGTCTTCTGTCACATTTTCCTGCTCGGATTTCGATTTTGGCGTATCTTCGGTCGGATTTCCGTGGTAATCGGACAACCAGGCAATCACTACAATTATTGCGCAAATGAATCGGGCAAC
The workflow’s manual annotation process is designed to mimick annotation performed by CIRCexplorer2
to standardise the annotation process for all circRNA quantification tools.
Intermediate files generated by each quantification tool are described in depth below.
CIRCexplorer2
Output files
-
circrna_discovery/circexplorer2/intermediates/${sample_id}/
*.bed
: Intermediate file generated byCIRCexplorer2 parse
module, identifying STAR fusion junctions for downstream annotation.*_circexplorer2_circs.bed
: Filtered BED6 file containing circRNA counts used for count matrix generation.*.txt
: Output files generated byCIRCexplorer2 annotate
module, based on BED 12 format containing circRNA genomic location information, exon cassette composition and an additional 6 columns specifying circRNA annotations. Full descriptions of the 18 columns can be found in theCIRCexplorer2
documentation.
-
circrna_discovery/star
1st_pass
*.Aligned.out.bam
: Coordinate sorted bam file containing aligned reads and chimeric reads.*.Chimeric.out.junction
: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found inSTAR
documentation (section 5.4).*.Log.final.out
: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.*.Log.out
: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.*.Log.progress.out
: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.*.SJ.out.tab
: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found inSTAR
documentation (section 4.4).
2nd_pass
*.Aligned.out.bam
: Coordinate sorted bam file containing aligned reads and chimeric reads.*.Chimeric.out.junction
: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found inSTAR
documentation (section 5.4).*.Log.final.out
: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.*.Log.out
: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.*.Log.progress.out
: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.*.SJ.out.tab
: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found inSTAR
documentation (section 4.4).
sjdb
dataset.SJ.out.tab
: Chromosome, start, end & strand coordinates of novel splice junctions for all samples aligned using STAR 1st pass.
CIRCexplorer2 uses *.Chimeric.out.junction
files generated from STAR
2 pass mode to extract back-splice junction sites using the CIRCexplorer2 parse
module. Following this, CIRCexplorer2 annotate
performs re-alignment of reads to the back-splice junction sites to determine the precise positions of downstream donor and upstream acceptor splice sites. Back-splice junction sites are subsequently updated and annotated using the customised annotation text file.
circRNA finder
Output files
-
circrna_discovery/circrna_finder/intermediates/${sample_id}/
*.filteredJunctions.bed
: A bed file with all circular junctions found by the pipeline. The score column indicates the number reads spanning each junction.*.s_filteredJunctions.bed
: A bed file with those junctions in*.filteredJunctions.bed
that are flanked by GT-AG splice sites. The score column indicates the number reads spanning each junction.*.s_filteredJunctions_fw.bed
: A bed file with the same circular junctions as in file (b), but here the score column gives the average number of forward spliced reads at both splice sites around each circular junction.
-
circrna_discovery/star
1st_pass
*.Aligned.out.bam
: Coordinate sorted bam file containing aligned reads and chimeric reads.*.Chimeric.out.junction
: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found inSTAR
documentation (section 5.4).*.Log.final.out
: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.*.Log.out
: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.*.Log.progress.out
: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.*.SJ.out.tab
: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found inSTAR
documentation (section 4.4).
2nd_pass
*.Aligned.out.bam
: Coordinate sorted bam file containing aligned reads and chimeric reads.*.Chimeric.out.junction
: Each line contains the details of chimerically aligned reads. Full descriptions of columns can be found inSTAR
documentation (section 5.4).*.Chimeric.out.sam
: Chimeric alignments in SAM format.*.Log.final.out
: Summary mapping statistics after mapping job is complete, useful for quality control. The statistics are calculated for each read (single- or paired-end) and then summed or averaged over all reads.*.Log.out
: Main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.*.Log.progress.out
: Reports job progress statistics, such as the number of processed reads, % of mapped reads etc.*.SJ.out.tab
: High confidence collapsed splice junctions in tab-delimited form. Full description of columns can be found inSTAR
documentation (section 4.4).
sjdb
dataset.SJ.out.tab
: Chromosome, start, end & strand coordinates of novel splice junctions for all samples aligned using STAR 1st pass.
circRNA finder uses *.Chimeric.out.sam
, *.Chimeric.out.junction
& *.SJ.out.tab
from STAR 2nd pass files to identify circular RNAs in RNA-Seq data.
CIRIquant
Output files
circrna_discovery/ciriquant/intermediates/${sample_id}/
*.log
: ACIRIerror.log
file which should be empty, and a${sample_id}.log
file which contains the output log ofCIRIquant
.*.bed
:CIRI2
output file in BED 6 format.*.gtf
: Output file fromCIRIquant
in GTF format. Full description of the columns available in theCIRIquant
documentation.align/
*.sorted.{bam, bam.bai}
: (Sorted and indexed) bam file fromHISAT2
alignment of RNA-Seq reads.
circ/
*.ciri
:CIRI2
output file.*_denovo.sorted.{bam, bam.bai}
: (Sorted and indexed) bam file fromBWA
alignment of candidate circular reads to the pseudo reference.*_index.*.ht2
:BWA
index files of the pseudo reference.*_index.fa
: Reference FASTA file of candidate circular reads.
CIRIquant operates by aligning RNA-Seq reads using HISAT2
and CIRI2 to identify putative circRNAs. Next, a pseudo reference index is generated using bwa index
by concatenating the two full-length sequences of the putative back-splice junction regions. Candidate circular reads are re-aligned against this pseudo reference using bwa mem
, and back-splice junction reads are determined if they can be linearly and completely aligned to the putative back-splice junction regions.
DCC
Output files
-
/circrna_discovery/DCC/intermediates/${sample_id}/
*CircCoordinates
: Circular RNA annotations in BED format. Full description of the columns are available in theDCC
documentation.*CircRNACount
: A table containing read counts for circRNAs detected.mate1/
: Output directory of STAR 2nd pass alignment for R1.mate2/
: Output directory of STAR 2nd pass alignment for R2.
DCC identifies back-splice junction sites from *Chimeric.out.junction
, *SJ.out.tab
& *Aligned.sortedByCoord.out.bam
files generated by STAR
2 pass mode, mapping the paired end reads both jointly and separately (STAR
does not output read pairs that contain more than one chimeric junction thus a more granular approach is taken by DCC
to fully characterise back-splice junctions in reads).
DCC
then performs a series of filtering steps on candidate circular reads:
- Mapping of mates must be consistent with a circular RNA template i.e align to the back-splice junction.
- Filtering by a minimum number of junction reads per replicate (
nf-core/circrna
has set this parameter to-Nr 1 1
allowing all reads). - Circular reads are not allowed span more than one gene.
- Circular reads aligning to mitochondrial genome are removed.
- Circular reads that lack a canonical (GT/AG) splicing signal at the circRNA junction borders are removed.
Find circ
Output files
circrna_discovery/find_circ/intermediates/${sample_id}/
*_anchors.qfa.gz
: 20mer anchors extracted from unmapped reads.*_unmapped.bam
: Unmapped RNA-Seq reads to reference genome.*.sites.bed
: Output fromfind_circ
, first six columns are in standard BED format. A description of the remaining columns is available in thefind_circ
documentation.*.sites.log
: Summary statistics of candidate circular reads in the sample.*.sites.reads
: Tab delimited file containing circRNA ID & sequence.
find circ utilises Bowtie2
short read mapper to align RNA-Seq reads to the genome. Reads that align fully and contiguously are discarded. Unmapped reads are converted to 20mers and aligned independently to find unique anchor positions within spliced exons - anchors that align in reverse orientation indicate circular RNA junctions. Anchor alignments are extended and must meet the following criteria:
- Breakpoints flanked by GT/AG splice sites.
- Unambiguous breakpoint detection.
- Maximum 2 mismatches in extension procedure.
- Breakpoint cannot reside more than 2nt inside a 20mer anchor.
- 2 reads must support the junction.
MapSplice
Output files
circrna_discovery/mapsplice/intermediates/${sample_id}/
alignments.bam
: Bam file containing aligned reads and fusion alignments.deletions.txt
: Report of deletions.Fusion output files
:fusions_raw.txt
: raw fusion junctions without filteringfusion_candidates.txt
: filtered fusion junctionsfusions_well_annotated.txt
: annotated fusion junction candidates (align to annotation file provided)fusions_not_well_annotated.txt
: fusions that do not align with supplied annotations
circular_RNAs.txt
: circular RNAs reported.insertions.txt
: Report of Insertions.junctions.txt
: Reported splice junctions.stats.txt
: Read alignment, Junction statistics.
MapSplice first splits reads into segments, and maps them to reference genome by using Bowtie
. MapSplice
attempts to fix unmapped segments as gapped alignments, with each gap corresponding to a splice junction. Finally a remapping step is used to identify back-spliced alignments that are in the presence of small exons.
Segemehl
Output files
circrna_discovery/segemehl/intermediates/${sample_id}/
*.bam
: Aligned reads in BAM format*.mult.bed
: Thus, this bed file contains all splice events of a read. The start and end positions indicate the nucleotide after the first split (i.e. the beginning of the first intron) and the nucleotide before the last split (i.e. the end of the last intron), respectively. The name and score are equivalent to the one in the *.sngl file described above. The following fields 7 & 8 (thickStart and thickEnd) should be the identical to fields 2 & 3. Field 9 holds the color information for the item in RGB encoding (itemRGB). Field 10 (blockCount) indicates the number of splits represented by the BED item. Field 11 is a comma separated list of the intron sizes (blockSizes). Field 12 is the comma separated list of intron starts (blockStarts).*.sngl.bed
: The bed file contains all single splice events predicted in the split read alignments.*.trns.bed
: The custom text file contains all single split alignments predicted to be in trans, i.e. split alignments that are located on different chromosomes and/or different strands.
Segemehl
implements split read alignment mode for reads that failed the attempt of collinear alignment. The algorithm will consider circular alignments. Circular splits are output to ${sample_id}.sngl.bed
and parsed using customised scripts to produce counts representative of Segemehl
quantification.
Count Matrix
Output files
circrna_discovery/
count_matrix.txt
: Raw circRNA read counts for all samples in matrix format.
nf-core/circrna
produces a counts matrix of circRNA read counts for each sample. circRNAs with BSJ reads < --bsj_reads <int>
have been removed during the quantification step, with a further filtering step included depending on the number of quantification tools selected. If the user has selected more than one circRNA quantification tool, nf-core/circrna
will demand that a circRNA be called by at least two quantification tools or else it is removed. This approach is recommended to reduce the number of false positives.
miRNA Prediction
miRanda
Output files
mirna_prediction/miRanda/${sample_id}/
*.miRanda.txt
: Raw outputs frommiRanda
.
miRanda performs miRNA target prediction of a genomic sequence against a miRNA database in 2 phases:
- First a dynamic programming local alignment is carried out between the query miRNA sequence and the reference sequence. This alignment procedure scores based on sequence complementarity and not on sequence identity.
- Secondly, the algorithm takes high-scoring alignments detected from phase 1 and estimates the thermodynamic stability of RNA duplexes based on these alignments. This second phase of the method utilises folding routines from the
RNAlib
library, part of the ViennaRNA package.
TargetScan
Output files
mirna_prediction/TargetScan/${sample_id}/
*.targetscan.txt
: Raw outputs fromTargetScan
.
TargetScan predicts biological targets of miRNAs by searching for the presence of conserved 8mer, 7mer, and 6mer sites within the circRNA mature sequence that match the seed region of each miRNA.
miRNA targets
Output files
mirna_prediction/${sample_id}/
*_miRNA_targets.txt
: Filtered target miRNAs of circRNAs called by quantification tools. Columns are self explanatory: miRNA, Score, Energy_KcalMol, Start, End, Site_type.
nf-core/circrna
performs miRNA target filtering on miRanda
and TargetScan
predictions:
- miRNA must be called by both
miRanda
andTargetScan
. - If a site within the circRNA mature sequence shares duplicate miRNA ID’s overlapping the same coordinates, the miRNA with the highest score is kept.
Differential Expression Analysis
nf-core/circrna
will perform differential expression analysis by contrasting every variable within the condition
column i.e the response variable.
samples | condition |
---|---|
control_rep1 | control |
control_rep2 | control |
control_rep3 | control |
lung_rep1 | lung |
lung_rep2 | lung |
lung_rep3 | lung |
melanoma_rep1 | melanoma |
melanoma_rep2 | melanoma |
melanoma_rep3 | melanoma |
The above experimental design will produce the DESeq2
design formula ~ condition
and loop through the nested factors within condition
producing outputs for control_vs_lung
, control_vs_melanoma
, lung_vs_control
, lung_vs_melanoma
, melanoma_vs_control
and melanoma_vs_lung
, capturing every possible contrast.
N.B: In the phenotype file the response variable must be called condition
, these values are hard-coded in the automated differential expression analysis R script.
circRNA
Output files
differential_expression/circRNA/
DESeq2_log2_transformed_counts.txt
: log2(Normalised counts + 1)DESeq2_normalized_counts.txt
: Normalised circRNA counts.control_vs_lung/
DESeq2_{control_vs_lung}_Adj_pvalue_distribution.pdf
: Histogram of Adj pvalues fromresults(dds)
displaying the distribution of circRNAs that reject the null hypothesis (padj <= 0.05).
Sample outputs from control_vs_lung
are given below, one of 6 DESeq2
results folders returned by the experimental design given above.
Note: The test dataset produces sparsely populated plots due to aggressive subsampling.
Boxplots
Output files
differential_expression/boxplots/
control_vs_lung
*boxplot.pdf
: Boxplot of differentially expressed circRNAs incontrol_vs_lung
.
control_vs_lung
*boxplot.pdf
: Boxplot of differentially expressed circRNAs incontrol_vs_melanoma
.
nf-core/circrna
will produce boxplots of differentially expressed circRNAs (normalised expression) between all contrasts available in condition
.
Note: The output files give examples for control_vs_lung
and control_vs_melanoma
.
RNA-Seq
Output files
differential_expression/RNA-Seq/
DESeq2_log2_transformed_counts.txt
: log2(Normalised counts + 1)DESeq2_normalized_counts.txt
: Normalised RNA-Seq counts.control_vs_lung/
DESeq2_{control_vs_lung}_Adj_pvalue_distribution.pdf
: Histogram of Adj pvalues fromresults(dds)
displaying the distribution of genes that reject the null hypothesis (padj <= 0.05).
Sample outputs from control_vs_lung
are given below, one of 6 DESeq2
results folders returned by the experimental design given above.
Note: The test dataset produces sparsely populated plots due to aggressive subsampling.