Introduction

This document describes the output produced by the pipeline. The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

FastQ download

Output files
  • fastq/
    • *.fastq.gz: Paired-end/single-end reads downloaded from the ENA / SRA.
  • fastq/md5/
    • *.md5: Files containing md5 sum for FastQ files downloaded from the ENA / SRA.
  • samplesheet/
    • samplesheet.csv: Auto-created samplesheet with collated metadata and paths to downloaded FastQ files.
  • metadata/
    • *.runinfo_ftp.tsv: Re-formatted metadata file downloaded from the ENA
    • *.runinfo.tsv: Original metadata file downloaded from the ENA

Please see the usage documentation for a list of supported public repository identifiers and how to provide them to the pipeline. The final sample information for all identifiers is obtained from the ENA which provides direct download links for FastQ files as well as their associated md5sums. If download links exist, the files will be downloaded in parallel by FTP otherwise they will NOT be downloaded. This is intentional because the tools such as parallel-fastq-dump, fasterq-dump, prefetch etc require pre-existing configuration files in the users home directory which makes automation tricky across different platforms and containerisation.

Pipeline information

Output files
  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.csv.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.