Table of Contents
Fetching ...

DM-QPMNET: Dual-modality fusion network for cell segmentation in quantitative phase microscopy

Rajatsubhra Chakraborty, Ana Espinosa-Momox, Riley Haskin, Depeng Xu, Rosario Porras-Aguilar

TL;DR

DM-QPMNet addresses segmentation in single-shot quantitative phase microscopy by exploiting complementary information from polarized intensities and phase maps. The authors implement a dual-encoder nnU-Net where polarized angles and phase are processed separately and fused at mid-encoder depth via content-aware multi-head attention, aided by modality-specific normalization and dual-source skips. Quantitative results on an ssQPM dataset show DM-QPMNet outperforms a 5-channel early-fusion baseline and single-modality models, achieving a Dice of $0.888 \pm 0.026$ and IoU of $0.799 \pm 0.040$. This approach provides a robust, label-free path toward real-time live-cell segmentation in ssQPM systems and demonstrates the value of modality-specific encoding for multi-modal bioimaging data.

Abstract

Cell segmentation in single-shot quantitative phase microscopy (ssQPM) faces challenges from traditional thresholding methods that are sensitive to noise and cell density, while deep learning approaches using simple channel concatenation fail to exploit the complementary nature of polarized intensity images and phase maps. We introduce DM-QPMNet, a dual-encoder network that treats these as distinct modalities with separate encoding streams. Our architecture fuses modality-specific features at intermediate depth via multi-head attention, enabling polarized edge and texture representations to selectively integrate complementary phase information. This content-aware fusion preserves training stability while adding principled multi-modal integration through dual-source skip connections and per-modality normalization at minimal overhead. Our approach demonstrates substantial improvements over monolithic concatenation and single-modality baselines, showing that modality-specific encoding with learnable fusion effectively exploits ssQPM's simultaneous capture of complementary illumination and phase cues for robust cell segmentation.

DM-QPMNET: Dual-modality fusion network for cell segmentation in quantitative phase microscopy

TL;DR

DM-QPMNet addresses segmentation in single-shot quantitative phase microscopy by exploiting complementary information from polarized intensities and phase maps. The authors implement a dual-encoder nnU-Net where polarized angles and phase are processed separately and fused at mid-encoder depth via content-aware multi-head attention, aided by modality-specific normalization and dual-source skips. Quantitative results on an ssQPM dataset show DM-QPMNet outperforms a 5-channel early-fusion baseline and single-modality models, achieving a Dice of and IoU of . This approach provides a robust, label-free path toward real-time live-cell segmentation in ssQPM systems and demonstrates the value of modality-specific encoding for multi-modal bioimaging data.

Abstract

Cell segmentation in single-shot quantitative phase microscopy (ssQPM) faces challenges from traditional thresholding methods that are sensitive to noise and cell density, while deep learning approaches using simple channel concatenation fail to exploit the complementary nature of polarized intensity images and phase maps. We introduce DM-QPMNet, a dual-encoder network that treats these as distinct modalities with separate encoding streams. Our architecture fuses modality-specific features at intermediate depth via multi-head attention, enabling polarized edge and texture representations to selectively integrate complementary phase information. This content-aware fusion preserves training stability while adding principled multi-modal integration through dual-source skip connections and per-modality normalization at minimal overhead. Our approach demonstrates substantial improvements over monolithic concatenation and single-modality baselines, showing that modality-specific encoding with learnable fusion effectively exploits ssQPM's simultaneous capture of complementary illumination and phase cues for robust cell segmentation.

Paper Structure

This paper contains 13 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: ssQPM dataset inputs: four polarized intensity images (0°, 45°, 90°, 135°), one quantitative phase map, and the corresponding ground-truth binary cell mask.
  • Figure 2: Late-fusion dual-encoder nnU-Net: four polarized intensity inputs (0°, 45°, 90°, 135°) and one phase map enter separate encoders; features fuse at Stage 2 via multi-head attention, then pass through a shared encoder tail and a decoder with deep supervision to produce the cell mask.
  • Figure 3: Leave-one-out modality ablation showing mean Dice and IoU over 6 test samples.
  • Figure 4: Qualitative segmentation on ssQPM (HeLa) at high (A), medium (B), and low (C) confluence. Columns show the ssQPM input, the ground-truth binary cell mask, and the DM-QPMNet prediction overlaid on the ground truth. In the overlay, yellow indicates correct agreement between prediction and ground truth, green shows ground-truth cell regions, and red shows predicted regions not supported by the ground truth. Dice/IoU for each panel are annotated.