ProbMed: A Probabilistic Framework for Medical Multimodal Binding
Yuan Gao, Sangwook Kim, Jianzhong You, Chris McIntosh
TL;DR
ProbMED introduces a probabilistic framework for binding four medical modalities—Chest X-ray (CXR), Electrocardiogram (ECG), Echocardiogram (ECHO), and clinical text—by representing each modality as a Gaussian distribution $Z_m \sim \mathcal{N}(\mu_m, \operatorname{diag}(\sigma_m^2))$ and aligning them in a shared embedding space via probabilistic contrastive learning. The method employs an InfoNCE objective with a Hellinger distance-based similarity $PS(q_n,k_t)=1-\sqrt{H^2(q_n,k_t)}$, a Synthetic Instance Sampling (SIS) loss to strengthen within-modality binding, and a Variational Information Bottleneck to prevent variance collapse, with a final loss combining these components across randomly sampled modality pairs. Extensive experiments on 13 medical datasets show ProbMED outperforms current Med-VLPMs in cross-modality retrieval, zero-shot, and few-shot classification, and demonstrates enhanced prognostication when integrating multiple modalities. The study also highlights practical benefits of probabilistic modeling, such as uncertainty-based prompt filtering and distribution-based sampling for data-scarce scenarios, underscoring ProbMED’s potential to improve multimodal clinical decision support. Overall, ProbMED advances medical multimodal learning by explicitly modeling uncertainty and many-to-many modality mappings, enabling robust, interpretable cross-modal reasoning in real-world healthcare settings.
Abstract
Medical decision-making requires integrating diverse medical information, from imaging to clinical narratives. These medical modalities are often acquired in a many-to-many manner. However, current medical vision-language pretraining models (Med-VLPMs) fail to directly account for this many-to-many mapping in their model training and embeddings. To address this, we present Probabilistic Modality-Enhanced Diagnosis (ProbMED), a multimodal Med-VLPM that employs probabilistic contrastive learning to model distributions over embeddings rather than deterministic estimates. ProbMED aligns four distinct modalities -- chest X-rays, electrocardiograms, echocardiograms, and clinical text -- into a unified probabilistic embedding space. We use InfoNCE loss with Hellinger distance to integrate inter-modality distributions. We introduce a probabilistic synthetic sampling loss that captures modality-specific mean and variance to improve intra-modality binding. Extensive experiments across 13 medical datasets demonstrate that our model outperforms current Med-VLPMs in cross-modality retrieval, zero-shot, and few-shot classification. We also demonstrate the robust integration of multiple modalities for prognostication, showing improved intra- and inter-medical modality binding.
