Cross-conditioned Diffusion Model for Medical Image to Image Translation

Zhaohu Xing; Sicheng Yang; Sixiang Chen; Tian Ye; Yijun Yang; Jing Qin; Lei Zhu

Cross-conditioned Diffusion Model for Medical Image to Image Translation

Zhaohu Xing, Sicheng Yang, Sixiang Chen, Tian Ye, Yijun Yang, Jing Qin, Lei Zhu

TL;DR

The paper addresses the problem of missing modalities in multi-modal MRI by introducing a Cross-conditioned Diffusion Model (CDM). CDM comprises three components: a Modality-specific Representation Model (MRM) to learn the target modality distribution, a Modality-decoupled Diffusion Network (MDN) to sample from that distribution efficiently, and a Cross-conditioned UNet (C-UNet) that synthesizes target modalities using source inputs guided by the sampled distribution, with losses including $L_{\mathrm{MRM}}$ and $L_{\mathrm{Syn}}$ and a DDIM-like sampling process via $q(y_t|y_0)$. Experiments on BraTS2023 and UPenn-GBM show that CDM achieves state-of-the-art or competitive performance while improving efficiency over conventional diffusion methods, demonstrating practical potential for clinical deployment with incomplete modality data. The approach advances medical image-to-image translation by separating target-distribution modeling from pixel-space generation and by integrating cross-modal guidance directly into the synthesis process. Overall, CDM offers a principled, scalable solution for completing multi-modal MRI data in real-world clinical workflows.

Abstract

Multi-modal magnetic resonance imaging (MRI) provides rich, complementary information for analyzing diseases. However, the practical challenges of acquiring multiple MRI modalities, such as cost, scan time, and safety considerations, often result in incomplete datasets. This affects both the quality of diagnosis and the performance of deep learning models trained on such data. Recent advancements in generative adversarial networks (GANs) and denoising diffusion models have shown promise in natural and medical image-to-image translation tasks. However, the complexity of training GANs and the computational expense associated with diffusion models hinder their development and application in this task. To address these issues, we introduce a Cross-conditioned Diffusion Model (CDM) for medical image-to-image translation. The core idea of CDM is to use the distribution of target modalities as guidance to improve synthesis quality while achieving higher generation efficiency compared to conventional diffusion models. First, we propose a Modality-specific Representation Model (MRM) to model the distribution of target modalities. Then, we design a Modality-decoupled Diffusion Network (MDN) to efficiently and effectively learn the distribution from MRM. Finally, a Cross-conditioned UNet (C-UNet) with a Condition Embedding module is designed to synthesize the target modalities with the source modalities as input and the target distribution for guidance. Extensive experiments conducted on the BraTS2023 and UPenn-GBM benchmark datasets demonstrate the superiority of our method.

Cross-conditioned Diffusion Model for Medical Image to Image Translation

TL;DR

and

and a DDIM-like sampling process via

. Experiments on BraTS2023 and UPenn-GBM show that CDM achieves state-of-the-art or competitive performance while improving efficiency over conventional diffusion methods, demonstrating practical potential for clinical deployment with incomplete modality data. The approach advances medical image-to-image translation by separating target-distribution modeling from pixel-space generation and by integrating cross-modal guidance directly into the synthesis process. Overall, CDM offers a principled, scalable solution for completing multi-modal MRI data in real-world clinical workflows.

Abstract

Paper Structure (11 sections, 4 equations, 5 figures, 4 tables)

This paper contains 11 sections, 4 equations, 5 figures, 4 tables.

Introduction
Method
Representation Learning for Target Modalities
Cross-conditioned UNet (C-UNet)
Experiments
Datasets and Implementation
Comparison with SOTA Methods
Ablation Study
Conclusion
Acknowledgments
Disclosure of Interests

Figures (5)

Figure 1: Comparison between the conventional Diffusion model (a) and our method (b). Our method replaces the time-consuming denoising UNet with a light Diffusion network, which achieves higher efficiency.
Figure 2: An overview of the proposed Cross-conditioned Diffusion Model (CDM). First, we introduce the Modality-specific Representation Model (a) to learn the distribution of target modalities. Then, the Modality-decoupled Diffusion Network (b) is employed to learn the target distribution. Finally, the Cross-conditioned UNet (c) incorporates the source modalities and samples the target distribution as guidance to generate the target modalities.
Figure 3: An overview of Modality-decoupled Diffusion Network (a) and Cross-conditioned Emebedding (b).
Figure 4: Visual comparisons of proposed CDM and other state-of-the-art methods.
Figure 5: The ablation studies for efficiency and parameter scale.

Cross-conditioned Diffusion Model for Medical Image to Image Translation

TL;DR

Abstract

Cross-conditioned Diffusion Model for Medical Image to Image Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)