Introduction

This document describes the output produced by the pipeline.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

Expression Atlas

Output files
  • expressionatlas/
    • List of accessions found when querying Expression Atlas: accessions.txt.
    • A list of count datasets and experimental designs download from Expression Atlas. Normalized datasets have the normalized.csv while not normalized datasets have the raw.csv extension.

Deseq2

Output files
  • normalization/deseq2/
    • List of newly normalized datasets

EdgeR

Output files
  • normalization/edger/
    • List of newly normalized datasets

GProfiler IDMapping

Output files
  • idmapping/
    • Count datasets whose gene IDs have been mapped to Ensembl IDs (suffix renamed.csv).
    • Correspondencies between original gene IDs and Ensembl IDs (suffix mapping.json.)

Variation coefficient

Output files
  • variation_coefficients/
    • An ordered list from the most stable (first line) to the least stable gene in variation_coefficients.csv.
    • All normalized counts (for each gene and each sample) in all_normalized_counts.csv.

Pipeline information

Output files
  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter’s are used when running the pipeline.
    • Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.