Introduction

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Main output files

MultiQC

This report is located at multiqc/multiqc_report.html and can be opened in a browser.

Output files
  • multiqc/
    • MultiQC report file: multiqc_report.html.
    • MultiQC data dir: multiqc_data.
    • Plots created by MultiQC: multiqc_plots.

MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.

Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.

Dash Plotly app

dash_app/: folder containing the Dash Plotly app

To launch the app, you must first create and activate the appropriate conda environment:

conda env create -n nf-core-stableexpression-dash -f <OUTDIR>/dash_app/spec-file.txt
conda activate nf-core-stableexpression-dash

then:

cd dash_app
python app.py

and open your browser at http://localhost:8080

Note

The app will try to use the port 8080 by default. If it is already in use, it will try 8081, 8082 and so on. Check the logs to see which port it is using.

Statistics and scoring

The gene stat summary is also bundled with the Dash Plotly app.

Output files
  • dash_app/data/all_genes_summary.csv: file containing all gene statistics, scores and ranked by stability score

Merged data

The file containing all normalised counts is bundled as a Parquet file with the Dash Plotly app.

Output files
  • dash_app/data/all_counts.imputed.parquet: parquet file containing all normalised + imputed gene counts
  • idmapping/global_gene_metadata.csv: table containing the complete set of gene metadata, obtained either via gProfiler or via the custom file provided by the user
  • idmapping/global_gene_id_mapping.csv: table containing the complete set of gene id mapping, obtained either via gProfiler or via the custom file - -
  • merged_datasets/whole_design.csv: table contained designs for all datasets and all samples comprised in the analysis

Other output files of interest (useful for debbuging)

Expression Atlas

Output files
  • public_data/expression_atlas/accessions/: accessions found when querying Expression Atlas
  • public_data/expression_atlas/datasets/: count datasets (normalized: *.normalised.csv / raw: *.raw.csv) and experimental designs (*.design.csv) downloaded from Expression Atlas.

GEO

Output files
  • public_data/geo/accessions/: accessions found when querying GEO
  • public_data/geo/datasets/: count datasets (normalized: *.normalised.csv / raw: *.raw.csv) and experimental designs (*.design.csv) downloaded from GEO.

IDMapping (g:Profiler)

Output files
  • renamed: count datasets with renamed and filtered gene IDs

Normalisation

Output files
  • normalised/: Newly normalised datasets
    • tpm/: with TPM
    • cpm/: with CPM
  • normalised/quantile_normalised : Quantile normalised datasets

Genome annotation and gene length

Output files
  • gene_length/:
    • gene_trnascript_lengths.csv: table containing gene transcript lengths
    • *.gff*: downloaded genome annotation

Pipeline information

Output files
  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter’s are used when running the pipeline.
    • Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.