DM-QPMNET: Dual-modality fusion network for cell segmentation in quantitative phase microscopy
Rajatsubhra Chakraborty, Ana Espinosa-Momox, Riley Haskin, Depeng Xu, Rosario Porras-Aguilar
TL;DR
DM-QPMNet addresses segmentation in single-shot quantitative phase microscopy by exploiting complementary information from polarized intensities and phase maps. The authors implement a dual-encoder nnU-Net where polarized angles and phase are processed separately and fused at mid-encoder depth via content-aware multi-head attention, aided by modality-specific normalization and dual-source skips. Quantitative results on an ssQPM dataset show DM-QPMNet outperforms a 5-channel early-fusion baseline and single-modality models, achieving a Dice of $0.888 \pm 0.026$ and IoU of $0.799 \pm 0.040$. This approach provides a robust, label-free path toward real-time live-cell segmentation in ssQPM systems and demonstrates the value of modality-specific encoding for multi-modal bioimaging data.
Abstract
Cell segmentation in single-shot quantitative phase microscopy (ssQPM) faces challenges from traditional thresholding methods that are sensitive to noise and cell density, while deep learning approaches using simple channel concatenation fail to exploit the complementary nature of polarized intensity images and phase maps. We introduce DM-QPMNet, a dual-encoder network that treats these as distinct modalities with separate encoding streams. Our architecture fuses modality-specific features at intermediate depth via multi-head attention, enabling polarized edge and texture representations to selectively integrate complementary phase information. This content-aware fusion preserves training stability while adding principled multi-modal integration through dual-source skip connections and per-modality normalization at minimal overhead. Our approach demonstrates substantial improvements over monolithic concatenation and single-modality baselines, showing that modality-specific encoding with learnable fusion effectively exploits ssQPM's simultaneous capture of complementary illumination and phase cues for robust cell segmentation.
