nf-core/stableexpression
This pipeline is dedicated to identifying the most stable genes within a single or multiple expression dataset(s). This is particularly useful for identifying the most suitable RT-qPCR reference genes for a specific species.
Introduction
This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Main output files
MultiQC
This report is located at multiqc/multiqc_report.html and can be opened in a browser.
Output files
multiqc/- MultiQC report file:
multiqc_report.html. - MultiQC data dir:
multiqc_data. - Plots created by MultiQC:
multiqc_plots.
- MultiQC report file:
MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.
Dash Plotly app
dash_app/: folder containing the Dash Plotly app
To launch the app, you must first create and activate the appropriate conda environment:
conda env create -n nf-core-stableexpression-dash -f <OUTDIR>/dash_app/spec-file.txt
conda activate nf-core-stableexpression-dashthen:
cd dash_app
python app.pyand open your browser at http://localhost:8080
The app will try to use the port 8080 by default. If it is already in use, it will try 8081, 8082 and so on. Check the logs to see which port it is using.
Statistics and scoring
The gene stat summary is also bundled with the Dash Plotly app.
Output files
dash_app/data/all_genes_summary.csv: file containing all gene statistics, scores and ranked by stability score
Merged data
The file containing all normalised counts is bundled as a Parquet file with the Dash Plotly app.
Output files
dash_app/data/all_counts.imputed.parquet: parquet file containing all normalised + imputed gene countsidmapping/global_gene_metadata.csv: table containing the complete set of gene metadata, obtained either via gProfiler or via the custom file provided by the useridmapping/global_gene_id_mapping.csv: table containing the complete set of gene id mapping, obtained either via gProfiler or via the custom file - -merged_datasets/whole_design.csv: table contained designs for all datasets and all samples comprised in the analysis
Other output files of interest (useful for debbuging)
Expression Atlas
Output files
public_data/expression_atlas/accessions/: accessions found when querying Expression Atlaspublic_data/expression_atlas/datasets/: count datasets (normalized:*.normalised.csv/ raw:*.raw.csv) and experimental designs (*.design.csv) downloaded from Expression Atlas.
GEO
Output files
public_data/geo/accessions/: accessions found when querying GEOpublic_data/geo/datasets/: count datasets (normalized:*.normalised.csv/ raw:*.raw.csv) and experimental designs (*.design.csv) downloaded from GEO.
IDMapping (g:Profiler)
Output files
renamed: count datasets with renamed and filtered gene IDs
Normalisation
Output files
normalised/: Newly normalised datasetstpm/: with TPMcpm/: with CPM
normalised/quantile_normalised: Quantile normalised datasets
Genome annotation and gene length
Output files
gene_length/:gene_trnascript_lengths.csv: table containing gene transcript lengths*.gff*: downloaded genome annotation
Pipeline information
Output files
pipeline_info/- Reports generated by Nextflow:
execution_report.html,execution_timeline.html,execution_trace.txtandpipeline_dag.dot/pipeline_dag.svg. - Reports generated by the pipeline:
pipeline_report.html,pipeline_report.txtandsoftware_versions.yml. Thepipeline_report*files will only be present if the--email/--email_on_failparameter’s are used when running the pipeline. - Parameters used by the pipeline run:
params.json.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.