Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

Zihao Wang; Yingyu Yang; Yuzhou Chen; Tingting Yuan; Maxime Sermesant; Herve Delingette; Ona Wu

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

Zihao Wang, Yingyu Yang, Yuzhou Chen, Tingting Yuan, Maxime Sermesant, Herve Delingette, Ona Wu

TL;DR

This work tackles zero-shot cross-modality medical image translation for segmentation by translating source-domain images into the target modality using a diffusion-based framework guided by localized statistical coherence. It introduces Local-wise Mutual Information (LMI) as a conditioning signal to steer the diffusion process, enabling cross-modality translation without source-domain training data. The proposed LMIDiffusion approach demonstrates superior translation quality and downstream segmentation performance on IXI PDw and T1w MRI data compared to GAN-based and diffusion-based baselines, highlighting practical utility in low-resource, multi-modality settings. The method offers a practical path for zero-shot segmentation across unseen modalities and can be extended with few-shot refinements to further improve alignment and robustness.

Abstract

Cross-modality image segmentation aims to segment the target modalities using a method designed in the source modality. Deep generative models can translate the target modality images into the source modality, thus enabling cross-modality segmentation. However, a vast body of existing cross-modality image translation methods relies on supervised learning. In this work, we aim to address the challenge of zero-shot learning-based image translation tasks (extreme scenarios in the target modality is unseen in the training phase). To leverage generative learning for zero-shot cross-modality image segmentation, we propose a novel unsupervised image translation method. The framework learns to translate the unseen source image to the target modality for image segmentation by leveraging the inherent statistical consistency between different modalities for diffusion guidance. Our framework captures identical cross-modality features in the statistical domain, offering diffusion guidance without relying on direct mappings between the source and target domains. This advantage allows our method to adapt to changing source domains without the need for retraining, making it highly practical when sufficient labeled source domain data is not available. The proposed framework is validated in zero-shot cross-modality image segmentation tasks through empirical comparisons with influential generative models, including adversarial-based and diffusion-based models.

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

TL;DR

Abstract

Paper Structure (11 sections, 12 equations, 2 figures, 1 table)

This paper contains 11 sections, 12 equations, 2 figures, 1 table.

Introduction
Method
Diffusion Model for Cross-modality Image Translation
Local-wise Mutual Information
Conditioning the Diffusion through the LMI
Experiment and Result
Dataset
Experiment
Result
Conclusion
Appendix

Figures (2)

Figure 1: Schematic diagram shows the LMI-guided diffusion for zero-shot cross-modal segmentation. The blue and orange contours are source and target distributions. The blue dot in the orange contour represents the target datapoint of the source datapoint (orange dot in the blue contour) in the source distribution. LMIDiffusion uses explicit statistical features (LMI) to navigate the next step (yellow dot), providing continuous guidance (yellow dot) from start to finish. In the end, the translated image can be segmented using arbitrary segmentation methods that were trained only on the target modality.
Figure 2: Qualitative evaluation of different models' translation results. The first two rows show target and original modality images, with close-ups of ROIs, followed by transformations from CycleGAN, StyleGAN, SDEdit, and LMIDiffusion. The subsequent row displays binarized segmentation results in the ROIs using a 3 clusters K-Means method for segmentation, trained solely on the target modality.

Theorems & Definitions (2)

proof : Property \ref{['theo;optmum']}
proof : Property \ref{['theo;error']}

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

TL;DR

Abstract

Diffusion based Zero-shot Medical Image-to-Image Translation for Cross Modality Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (2)