QuantMS pipeline

The goal is to make a state of affairs. Compatibility with nf-core, preparing local for nf-core/modules (which can still be patched to be locally updated). We as the organizing team are interested in running DIA-NN through the workflow, which can be done either via QuantMS or the nf-core/dia_proteomics_analysis` subworkflow.

Additionally exploring the new tools nf-docs and nf-metro for documentation and deployment instructions can be done during the hackathon.

Last, if people want to rather explore datasets, benchmarking of microbial datasets can be done.

DIA-NN updates

Subworkflows

Process names are not aligned, but I mapped them one to one in order.

  • DIANN subworkflow in nf-core/modules repo at subworkflows/nf-core/dia_proteomics_analysis

    include { QUANTMSUTILS_DIANNCFG          } from '../../../modules/nf-core/quantmsutils/dianncfg/main'
    include { QUANTMSUTILS_MZMLSTATISTICS    } from '../../../modules/nf-core/quantmsutils/mzmlstatistics/main'
    include { QUANTMSUTILS_DIANN2MZTAB       } from '../../../modules/nf-core/quantmsutils/diann2mztab/main'
     
    include { DIANN as DIANN_INSILICOLIBRARYGENERATION } from '../../../modules/nf-core/diann/main'
    include { DIANN as DIANN_PRELIMINARYANALYSIS } from '../../../modules/nf-core/diann/main'
    include { DIANN as DIANN_ASSEMBLEEMPIRICALLIBRARY } from '../../../modules/nf-core/diann/main'
    include { DIANN as DIANN_INDIVIDUALANALYSIS } from '../../../modules/nf-core/diann/main'
    include { DIANN as DIANN_FINALQUANTIFICATION } from '../../../modules/nf-core/diann/main'
  • DIANN under local modules in bigbio/quantms. Process names are not aligned, but I mapped them one to one in order. So files could be compared to the ones in nf-core/modules repo.

    include { GENERATE_CFG                } from '../modules/local/diann/generate_cfg/main'
    include { MSSTATS_LFQ                 } from '../modules/local/msstats/msstats_lfq/main'
    include { CONVERT_RESULTS             } from '../modules/local/diann/convert_results/main'
     
    include { INSILICO_LIBRARY_GENERATION } from '../modules/local/diann/insilico_library_generation/main'
    include { PRELIMINARY_ANALYSIS        } from '../modules/local/diann/preliminary_analysis/main'
    include { ASSEMBLE_EMPIRICAL_LIBRARY  } from '../modules/local/diann/assemble_empirical_library/main'
    include { INDIVIDUAL_ANALYSIS         } from '../modules/local/diann/individual_analysis/main'
    include { FINAL_QUANTIFICATION        } from '../modules/local/diann/final_quantification/main'

Compare and take inspiration by Jonathan Mannings way to write modules and subworkflows?

Updates to nf-core

To get familiar with nf-core templates and requirements, one could try to move some tools for the use of others to nf-core/modules repo. Any in

  • subworkflows/local
  • modules/local
  • modules/bigbio

One could use and update modules which have a local version, but are maintained by others in nf-core/modules repo. For example:

  • Update ThermoRawFileParser (C#) to use nf-core/modules/thermorawfileparser version instead of modules/bigbio/thermorawfileparser

Exercise: Add a module to nf-core/modules

  • if the process is based on a python or conda package, wave allows easy containerization
  • nf-tests need to be added

Useful hints.

List of candidates (tbc)

  • pmultiqc (Python)
  • msstats (R)

nf-core lint

.nf-core.yaml file deactivates some things for linting. check what and how.

Run

# in quantms repo
nf-core pipelines lint -d .

nf-docs

Add or use ewels/nf-docs

nf-metro

Add a new metro-map based on a configuration file: pinin4fjords/nf-metro

  • maybe add to deployment instructions (manual updates or actions)

Comparisons using PRIDE: DIA datasets

Run experiments, compare outputs to results provided on PRIDE. Familiarize with running quantms. Should be supplemented with inhouse data, which is now all DIA on Bruker experiments.

  • PXD054415 - comparing DDA and DIA on metaproteomics dataset with known compositions
    • could use a subset of samples
    • SDRF
  • PXD049262
    • growth experiment
    • photosynthetic metabolism of purple sulphur bacteria Halorhodospira halophila
    • cultivated with various sulphur compounds
    • SDRF

Included benchmark dataset in quantms

Mentioned as an example for DIA

Performance benchmarking

  • running quantms on a single machine, single VM, on Azure batch, on HPCs with apptainers:
    • runtime, costs, etc.

DIANN docker files

location
ZS Copenhagen and online
category
pipelines
group leader