Cross-conditioned Diffusion Model for Medical Image to Image Translation
Zhaohu Xing, Sicheng Yang, Sixiang Chen, Tian Ye, Yijun Yang, Jing Qin, Lei Zhu
TL;DR
The paper addresses the problem of missing modalities in multi-modal MRI by introducing a Cross-conditioned Diffusion Model (CDM). CDM comprises three components: a Modality-specific Representation Model (MRM) to learn the target modality distribution, a Modality-decoupled Diffusion Network (MDN) to sample from that distribution efficiently, and a Cross-conditioned UNet (C-UNet) that synthesizes target modalities using source inputs guided by the sampled distribution, with losses including $L_{\mathrm{MRM}}$ and $L_{\mathrm{Syn}}$ and a DDIM-like sampling process via $q(y_t|y_0)$. Experiments on BraTS2023 and UPenn-GBM show that CDM achieves state-of-the-art or competitive performance while improving efficiency over conventional diffusion methods, demonstrating practical potential for clinical deployment with incomplete modality data. The approach advances medical image-to-image translation by separating target-distribution modeling from pixel-space generation and by integrating cross-modal guidance directly into the synthesis process. Overall, CDM offers a principled, scalable solution for completing multi-modal MRI data in real-world clinical workflows.
Abstract
Multi-modal magnetic resonance imaging (MRI) provides rich, complementary information for analyzing diseases. However, the practical challenges of acquiring multiple MRI modalities, such as cost, scan time, and safety considerations, often result in incomplete datasets. This affects both the quality of diagnosis and the performance of deep learning models trained on such data. Recent advancements in generative adversarial networks (GANs) and denoising diffusion models have shown promise in natural and medical image-to-image translation tasks. However, the complexity of training GANs and the computational expense associated with diffusion models hinder their development and application in this task. To address these issues, we introduce a Cross-conditioned Diffusion Model (CDM) for medical image-to-image translation. The core idea of CDM is to use the distribution of target modalities as guidance to improve synthesis quality while achieving higher generation efficiency compared to conventional diffusion models. First, we propose a Modality-specific Representation Model (MRM) to model the distribution of target modalities. Then, we design a Modality-decoupled Diffusion Network (MDN) to efficiently and effectively learn the distribution from MRM. Finally, a Cross-conditioned UNet (C-UNet) with a Condition Embedding module is designed to synthesize the target modalities with the source modalities as input and the target distribution for guidance. Extensive experiments conducted on the BraTS2023 and UPenn-GBM benchmark datasets demonstrate the superiority of our method.
