Table of Contents
Fetching ...

Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation

Yinchi Zhou, Tianqi Chen, Jun Hou, Huidong Xie, Nicha C. Dvornek, S. Kevin Zhou, David L. Wilson, James S. Duncan, Chi Liu, Bo Zhou

TL;DR

The paper addresses the need for high-quality medical image-to-image translation with uncertainty estimation by integrating a GAN-based prior with a cascaded, multi-path diffusion refinement (CMDM). By starting the diffusion process at a shortcut time $t_s$ using the GAN-derived prior and averaging over multiple noise-perturbed paths across cascading stages, CMDM reduces iterations while enhancing robustness and enabling pixel-wise uncertainty estimation. Empirical results across DE X-ray, sparse-view CT, and MRI translations show CMDM achieving higher PSNR and lower MAE than prior CNN and diffusion baselines, with uncertainty maps correlating well with translation errors. The approach is modular and potentially extensible as a plug-and-play enhancement for uncertainty-aware medical image translation, albeit at the cost of longer inference time and current 2D implementation limitations.

Abstract

Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their capability in medical image translation tasks, the potential of combining a GAN and DM to further improve translation performance and to enable uncertainty estimation remains largely unexplored. In this work, we address these challenges by proposing a Cascade Multi-path Shortcut Diffusion Model (CMDM) for high-quality medical image translation and uncertainty estimation. To reduce the required number of iterations and ensure robust performance, our method first obtains a conditional GAN-generated prior image that will be used for the efficient reverse translation with a DM in the subsequent step. Additionally, a multi-path shortcut diffusion strategy is employed to refine translation results and estimate uncertainty. A cascaded pipeline further enhances translation quality, incorporating residual averaging between cascades. We collected three different medical image datasets with two sub-tasks for each dataset to test the generalizability of our approach. Our experimental results found that CMDM can produce high-quality translations comparable to state-of-the-art methods while providing reasonable uncertainty estimations that correlate well with the translation error.

Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation

TL;DR

The paper addresses the need for high-quality medical image-to-image translation with uncertainty estimation by integrating a GAN-based prior with a cascaded, multi-path diffusion refinement (CMDM). By starting the diffusion process at a shortcut time using the GAN-derived prior and averaging over multiple noise-perturbed paths across cascading stages, CMDM reduces iterations while enhancing robustness and enabling pixel-wise uncertainty estimation. Empirical results across DE X-ray, sparse-view CT, and MRI translations show CMDM achieving higher PSNR and lower MAE than prior CNN and diffusion baselines, with uncertainty maps correlating well with translation errors. The approach is modular and potentially extensible as a plug-and-play enhancement for uncertainty-aware medical image translation, albeit at the cost of longer inference time and current 2D implementation limitations.

Abstract

Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their capability in medical image translation tasks, the potential of combining a GAN and DM to further improve translation performance and to enable uncertainty estimation remains largely unexplored. In this work, we address these challenges by proposing a Cascade Multi-path Shortcut Diffusion Model (CMDM) for high-quality medical image translation and uncertainty estimation. To reduce the required number of iterations and ensure robust performance, our method first obtains a conditional GAN-generated prior image that will be used for the efficient reverse translation with a DM in the subsequent step. Additionally, a multi-path shortcut diffusion strategy is employed to refine translation results and estimate uncertainty. A cascaded pipeline further enhances translation quality, incorporating residual averaging between cascades. We collected three different medical image datasets with two sub-tasks for each dataset to test the generalizability of our approach. Our experimental results found that CMDM can produce high-quality translations comparable to state-of-the-art methods while providing reasonable uncertainty estimations that correlate well with the translation error.
Paper Structure (9 sections, 11 equations, 6 figures, 4 tables)

This paper contains 9 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Illustration of previous I2I diffusion model generation process. Starting the reverse process with different noise initialization leads to divergent translation results.
  • Figure 2: The overall workflow of our proposed Cascade Multi-path Shortcut Diffusion Model (CMDM). CMDM consists of a one-step inference model (green) and cascades of MPD block (grey). Each MPD block consists of multiple shortcut reverse paths starting with a prior image with different noise. The cascades are connected with residual averaging operations.
  • Figure 3: Inference Process - Cascaded Multi-path Shortcut Diffusion Model (CMDM)
  • Figure 4: Qualitative comparison of translation results and corresponding error map from different methods. Examples from DE X-ray soft-tissue generation (Left), Sparse-view CT reconstruction (Middle), and MRI T1-to-T2 synthesis are shown. The image quality metrics of each sample are indicated at the bottom left of the images.
  • Figure 5: Ablative studies on the reverse starting time (Left), the number of paths (Middle), and the number of cascades (Right). DE X-ray soft-tissue image generation and 1/6 SVCT reconstruction were utilized for these studies. Peak performances were annotated on the plots with the corresponding image quality metric, i.e. PSNR.
  • ...and 1 more figures