Table of Contents
Fetching ...

Plasticine: A Traceable Diffusion Model for Medical Image Translation

Tianyang Zhanng, Xinxing Cheng, Jun Cheng, Shaoming Zheng, He Zhao, Huazhu Fu, Alejandro F Frangi, Jiang Liu, Jinming Duan

TL;DR

Plasticine tackles the lack of traceability in medical image translation by embedding intensity translation and spatial transformations inside a diffusion framework, enabling pixel-level correspondences between source and translated images. It introduces a diffusion-based intensity translator and a cross-modality spatial transformation module that yields diffeomorphic deformations and interpretable spatial changes, preserving topology. Across retinal OCT, chest MRI-CT, and cardiac MRI tasks, Plasticine demonstrates superior traceability via segmentation metrics and competitive image synthesis against GAN and diffusion baselines, with clinical user studies supporting its practical utility. The work also highlights limitations, including dependence on precomputed structure maps and future plans for extending to 3D data.

Abstract

Domain gaps arising from variations in imaging devices and population distributions pose significant challenges for machine learning in medical image analysis. Existing image-to-image translation methods primarily aim to learn mappings between domains, often generating diverse synthetic data with variations in anatomical scale and shape, but they usually overlook spatial correspondence during the translation process. For clinical applications, traceability, defined as the ability to provide pixel-level correspondences between original and translated images, is equally important. This property enhances clinical interpretability but has been largely overlooked in previous approaches. To address this gap, we propose Plasticine, which is, to the best of our knowledge, the first end-to-end image-to-image translation framework explicitly designed with traceability as a core objective. Our method combines intensity translation and spatial transformation within a denoising diffusion framework. This design enables the generation of synthetic images with interpretable intensity transitions and spatially coherent deformations, supporting pixel-wise traceability throughout the translation process.

Plasticine: A Traceable Diffusion Model for Medical Image Translation

TL;DR

Plasticine tackles the lack of traceability in medical image translation by embedding intensity translation and spatial transformations inside a diffusion framework, enabling pixel-level correspondences between source and translated images. It introduces a diffusion-based intensity translator and a cross-modality spatial transformation module that yields diffeomorphic deformations and interpretable spatial changes, preserving topology. Across retinal OCT, chest MRI-CT, and cardiac MRI tasks, Plasticine demonstrates superior traceability via segmentation metrics and competitive image synthesis against GAN and diffusion baselines, with clinical user studies supporting its practical utility. The work also highlights limitations, including dependence on precomputed structure maps and future plans for extending to 3D data.

Abstract

Domain gaps arising from variations in imaging devices and population distributions pose significant challenges for machine learning in medical image analysis. Existing image-to-image translation methods primarily aim to learn mappings between domains, often generating diverse synthetic data with variations in anatomical scale and shape, but they usually overlook spatial correspondence during the translation process. For clinical applications, traceability, defined as the ability to provide pixel-level correspondences between original and translated images, is equally important. This property enhances clinical interpretability but has been largely overlooked in previous approaches. To address this gap, we propose Plasticine, which is, to the best of our knowledge, the first end-to-end image-to-image translation framework explicitly designed with traceability as a core objective. Our method combines intensity translation and spatial transformation within a denoising diffusion framework. This design enables the generation of synthetic images with interpretable intensity transitions and spatially coherent deformations, supporting pixel-wise traceability throughout the translation process.

Paper Structure

This paper contains 29 sections, 23 equations, 15 figures, 6 tables, 3 algorithms.

Figures (15)

  • Figure 1: Plasticine model generates high quality images with the scalability to fit the target distribution. More importantly, it has spatial traceability for clinicians to track the changes. A demo webpage is released at https://Plasticine001.github.io.
  • Figure 2: The figure illustrates the comparison between an original diffusion model and a proposed method named Plasticine. (a) In the original diffusion model, the image synthesis begins with random noise $\bm z$, which is progressively refined into an image $\bm y_0$. (b) In the proposed Plasticine method, the process starts with a source image $\bm x_0$. This source image is subjected to noise addition, which is then is progressively refined to to create the target image $\bm y_0$ by using deformations $\bm \phi$ basic contours $\bm x_s$, and the estimated noise $\bm{\hat{\epsilon}}$. These $\bm \phi$ and $\bm{\hat{\epsilon}}$ are obtained by a proposed network. The details of the network architecture and the complete mathematical process are provided in Methodology and Appendix Section.
  • Figure 3: Illustration of the diffusion noising (add noise) and denoising (inference) process. The $\bm x_0$ and $\bm y_0$ denotes the source image $\bm x$ and the target image $\bm y$ respectively, while $\bm x_k$ and $\bm y_k$ are the noising $\bm x_0$ and $\bm y_0$ after $k$ steps. The estimated $\hat{\bm{y}}$ is obtained by the diffusion denoising process through noised input $\bm x_K$ and the deformation $\phi$.
  • Figure 4: The training process of the deformation module. The source image $\bm x$ and target image $\bm y$ are tacked as image pairs, together with the $\bm {\hat{\epsilon}}$ from the diffusion auto-encoder $\bm\epsilon_{\theta_1}$, as the input of the registration module. The registration module first predicts both the initial forward and backward velocities, i.e.,$\bm v$ and $\bm v^{-1}$, and then generates the final deformations from the velocities with a composition layer. With the computed (forward or backward) deformations, i.e.,$\bm \phi$ and $\bm \phi^{-1}$, we can then warp an image and optimize the distances between the warped image and its corresponding reference image under a loss criterion such as Mutual Information.
  • Figure 5: Illustration of the static composition process.
  • ...and 10 more figures