A pipeline for processing Molecular Cartography data from Resolve Bioscience (combinatorial FISH)
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
The pipeline is built using Nextflow and processes data using the following steps:
Mindagap - Fill empty grid lines in a panorama image with neighbor-weighted values.
CLAHE - perform contrast-limited adaptive histogram equalization.
Create stacks - If a second image is provided, combine both into one stack as input for segmentation modules.
segmentation - Segment single cells from provided image using segmentation method of choice (Cellpose, Mesmer, ilastik) and filter them by size.
Mindagap_duplicatefinder - Take a spot table and search for duplicates along grid lines.
Spot2cell - Assign non-duplicated spots to segmented cells based on segmentation mask and extract cell shape information.
MolkartQC - Produce QC metrics specific to this pipeline.
MultiQC - Aggregate report describing results and QC from the whole pipeline.
Pipeline information - Report metrics generated during the workflow execution.
Create training subset - creates crops for segmentation training (Cellpose, ilastik).
*_gridfilled.tiff: Gridfilled panorama file(s).
*_markedDups.txt: Spot table with duplicated spots marked as ‘Duplicated’.
Mindagap fills empty grids of a panorama made from several tiles using the mean of the immediate neighborhood, as well as marking duplicated spots near the grid from the spot table.
*_clahe.tiff: Image with contrast-limited adaptive histogram equalization applied.
CLAHE is a algorithm from scikit-image for local contrast enhancement, that uses histograms computed over different tile regions of the image. Local details can therefore be enhanced even in regions that are darker or lighter than most of the image.
*.ome.tif: Image containing provided input images as channels.
Create stack is a local module used to merge images into a stack as preparation for segmentation processes.
*_cellpose_mask.tif: Segmentation masks created by Cellpose.
*_probability_maps.hdf5: Probability maps created by ilastik’s Pixel Classifier workflow.
*_ilastik_mask.tif: Segmentation masks created by ilastik’s Boundary prediction with Multicut workflow.
*_mesmer_mask.tif: Segmentation masks created by Mesmer.
*_method_filtered.tif: Segmentation masks filtered based on provided area limits.
Cellpose is a segmentation tool that provides pretrained models as well as additional human-in-the loop training. If additional training is performed, the envisioned way of doing it is creating the training subset (
tiff), and training the model in the Cellpose GUI on the subset, then giving the trained model as an argument within the pipeline to complete the pipeline run.
ilastik is an interactive learning and segmentation toolkit, with its application here envisioned as - create training subset (
hdf5), create Pixel Classifier and Boundary prediction with Multicut projects with specified parameters. Within Molkart, the project files can be given and batch processing would be applied on the full images.
Mesmer is a segmentation tool that provides pretrained models for whole-cell and nuclear segmentation.
csvfile containing transcript counts per cell, as well as cell shape properties.
Spot2cell is a local module that assigns spots (without Duplicates) to cells via a spot table and segmentation mask.
*.adata: Anndata object containing the spot count table, spatial locations of cells in
adata.obsmand metadata like ‘Area’, ‘MajorAxisLength’, ‘MinorAxisLength’, ‘Eccentricity’, ‘Solidity’, ‘Extent’, ‘Orientation’ in
CREATE_ANNDATA is a local module that generates an AnnData object storing expression, metadata and spatial locations of cells.
*.spot_QC.csv: Sheet containing useful quality-control metrics specific to spot-based image processing methods.
MolkartQC is a local module used for gathering useful quality-control metrics for spot-based image processing methods, including: sample ID, used segmentation method, total number of cells, average cell area, total number of spots, average spot assignment per cell, total number of assigned spots, percentage of assigned spots, number of duplicated spots.
final_QC.all_samples.csv: all molkartqc outputs concatenated to one
*.crop_overview.png: Crop overview for visual assessment of crop placement on the whole sample.
multiqc_report.html: a standalone HTML file that can be viewed in your web browser.
multiqc_data/: directory containing parsed statistics from the different tools used in the pipeline.
multiqc_plots/: directory containing static images from the report in various formats.
MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.
hdf5crops for training Pixel classification and Multicut models with ilastik for segmentation.
tiffcrops for training Cellpose to create a custom segmentation model.
Create training subset is an optional group of modules that create crops in
tiff formats, as well as provide the crop overview for reusability.
- Reports generated by Nextflow:
- Reports generated by the pipeline:
pipeline_report*files will only be present if the
--email_on_failparameters are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline:
- Parameters used by the pipeline run:
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.