Table of Contents
Fetching ...

Self-Consistent Recursive Diffusion Bridge for Medical Image Translation

Fuat Arslan, Bilal Kabas, Onat Dalmaz, Muzaffer Ozbey, Tolga Çukur

TL;DR

SelfRDB introduces a self-consistent recursive diffusion bridge for medical image translation that directly maps between source and target modalities. It uses a forward process with a soft-prior on the source and a monotonically increasing noise variance toward the noise-added source end-point, coupled with a reverse process that iteratively refines a target-image estimate until self-consistency. Empirical results on multi-contrast MRI and MRI-CT translation demonstrate superior performance over GANs and diffusion-based baselines, with ablations confirming the importance of the soft-prior, stationary guidance, and recursive sampling. The approach offers improved generalization and information transfer across modalities, paving the way for robust, clinically relevant multi-modal image synthesis.

Abstract

Denoising diffusion models (DDM) have gained recent traction in medical image translation given improved training stability over adversarial models. DDMs learn a multi-step denoising transformation to progressively map random Gaussian-noise images onto target-modality images, while receiving stationary guidance from source-modality images. As this denoising transformation diverges significantly from the task-relevant source-to-target transformation, DDMs can suffer from weak source-modality guidance. Here, we propose a novel self-consistent recursive diffusion bridge (SelfRDB) for improved performance in medical image translation. Unlike DDMs, SelfRDB employs a novel forward process with start- and end-points defined based on target and source images, respectively. Intermediate image samples across the process are expressed via a normal distribution with mean taken as a convex combination of start-end points, and variance from additive noise. Unlike regular diffusion bridges that prescribe zero variance at start-end points and high variance at mid-point of the process, we propose a novel noise scheduling with monotonically increasing variance towards the end-point in order to boost generalization performance and facilitate information transfer between the two modalities. To further enhance sampling accuracy in each reverse step, we propose a novel sampling procedure where the network recursively generates a transient-estimate of the target image until convergence onto a self-consistent solution. Comprehensive analyses in multi-contrast MRI and MRI-CT translation indicate that SelfRDB offers superior performance against competing methods.

Self-Consistent Recursive Diffusion Bridge for Medical Image Translation

TL;DR

SelfRDB introduces a self-consistent recursive diffusion bridge for medical image translation that directly maps between source and target modalities. It uses a forward process with a soft-prior on the source and a monotonically increasing noise variance toward the noise-added source end-point, coupled with a reverse process that iteratively refines a target-image estimate until self-consistency. Empirical results on multi-contrast MRI and MRI-CT translation demonstrate superior performance over GANs and diffusion-based baselines, with ablations confirming the importance of the soft-prior, stationary guidance, and recursive sampling. The approach offers improved generalization and information transfer across modalities, paving the way for robust, clinically relevant multi-modal image synthesis.

Abstract

Denoising diffusion models (DDM) have gained recent traction in medical image translation given improved training stability over adversarial models. DDMs learn a multi-step denoising transformation to progressively map random Gaussian-noise images onto target-modality images, while receiving stationary guidance from source-modality images. As this denoising transformation diverges significantly from the task-relevant source-to-target transformation, DDMs can suffer from weak source-modality guidance. Here, we propose a novel self-consistent recursive diffusion bridge (SelfRDB) for improved performance in medical image translation. Unlike DDMs, SelfRDB employs a novel forward process with start- and end-points defined based on target and source images, respectively. Intermediate image samples across the process are expressed via a normal distribution with mean taken as a convex combination of start-end points, and variance from additive noise. Unlike regular diffusion bridges that prescribe zero variance at start-end points and high variance at mid-point of the process, we propose a novel noise scheduling with monotonically increasing variance towards the end-point in order to boost generalization performance and facilitate information transfer between the two modalities. To further enhance sampling accuracy in each reverse step, we propose a novel sampling procedure where the network recursively generates a transient-estimate of the target image until convergence onto a self-consistent solution. Comprehensive analyses in multi-contrast MRI and MRI-CT translation indicate that SelfRDB offers superior performance against competing methods.
Paper Structure (26 sections, 16 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 16 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Diffusion methods commonly take the target image as the start-point $\boldsymbol{x}_0$ of the diffusion process, albeit they can differ in expression of image samples in remaining timesteps. Illustrations of images across the forward process are depicted along with underlying schedules for the mean ($\mu_{x_0,t}$, $\mu_{y,t}$) and noise variance ($\sigma_{t}^2$). (a) Classical diffusion: DDMs use a pure noise image as an asymptotic end-point $\boldsymbol{x}_T$. Intermediate samples are obtained by adding increasing levels of random Gaussian noise onto the target image. (b) Diffusion bridge: Regular bridges use the source image as a finite end-point. Intermediate samples are taken as a convex combination of source-target images, corrupted with additive noise. Noise variance is zero at start- and end-points, and it peaks at the mid-point. (c) Proposed: SelfRDB is a novel diffusion bridge that uses a noise-added source image as the end-point. Intermediate samples still depend on a convex combination of source-target images, yet SelfRDB uniquely prescribes monotonically-increasing noise variance towards the end-point.
  • Figure 2: Diffusion models learn the score function of the data through a multi-step transformation between the start- and end-points of the underlying process. Image samples are typically corrupted with Gaussian noise that smooths the data distribution by masking some of the original image features. Smoothing enables more uniform coverage of the data space in order to boost generalization performance. (a) Regular diffusion bridges use zero noise variance at the end-point constraining them to a Dirac-delta distribution centered on the source images within the training set. This can compromise generalization performance to source images outside the training set (see purple-colored dashed paths). (b) SelfRDB instead uses monotonically-increasing variance towards the end-point, so it is trained on noise-added source images. This improves robustness against variability in source images between training and test sets (see purple-colored dashed paths).
  • Figure 3: SelfRDB casts a diffusion bridge between source and target images of an anatomy. (a) In the forward process, the start-point $\boldsymbol{x}_0$ is taken as the target image and the end-point $\boldsymbol{x}_T$ is taken as a noise-added version of the source image $\boldsymbol{y}_{\epsilon}$. Intermediate image samples are derived via the forward transition probability $q(\boldsymbol{x}_t | \boldsymbol{x}_{t-1},\boldsymbol{y})$, whose mean is a convex combination of target-source images, and whose variance is driven by noise. In the reverse process, sampling is initiated on $\boldsymbol{x}_T=\boldsymbol{y}_{\epsilon}$, and intermediate samples are derived via the reverse transition probability $p_{\theta}(\boldsymbol{x}_{t-1} | \boldsymbol{x}_{t},\boldsymbol{y}_{\epsilon})$. (b) Reverse diffusion steps are operationalized via a recovery network $G_{\theta}(\boldsymbol{x}_t,t,\boldsymbol{y},\tilde{\boldsymbol{x}}_0^r)$ that recursively generates a target-image estimate $\tilde{\boldsymbol{x}}_0^{r+1}$ at the current timestep, given the target-image estimate from the previous recursion $\tilde{\boldsymbol{x}}_0^{r}$ and the original source image $\boldsymbol{y}$. Recursions are stopped upon convergence onto a self-consistent solution $\tilde{\boldsymbol{x}}_0^*=G_{\theta}(\boldsymbol{x}_t,t,\boldsymbol{y},\tilde{\boldsymbol{x}}_0^*)$, which is then used for posterior sampling of $\hat{\boldsymbol{x}}_{t-1}$ according to the normal distribution $q(\boldsymbol{x}_{t-1}|\boldsymbol{x}_t,\boldsymbol{y},\tilde{\boldsymbol{x}}_0^*)$. To improve posterior sampling, a discriminator subnetwork $D_{\theta}(\boldsymbol{x}_{t-1} \hbox{or} \hat{\boldsymbol{x}}_{t-1},t,\boldsymbol{x}_t)$ is used to perform adversarial learning on the recovered sample $\hat{\boldsymbol{x}}_{t-1}$.
  • Figure 4: Multi-contrast MRI translation for a representative PD$\rightarrow$T1 task in the IXI dataset. Synthesized target images for competing methods are shown along with the reference target image (i.e., ground truth) and the input source image. Zoom-in display windows are used to highlight differences in synthesis performance.
  • Figure 5: Multi-contrast MRI translation for a representative FLAIR$\rightarrow$T2 task in the BRATS dataset. Synthesized target images for competing methods are shown along with the reference target image (i.e., ground truth) and the input source image.
  • ...and 1 more figures