Table of Contents
Fetching ...

ContourDiff: Unpaired Medical Image Translation with Structural Consistency

Yuwen Chen, Nicholas Konz, Hanxue Gu, Haoyu Dong, Yaqian Chen, Lin Li, Jisoo Lee, Maciej A. Mazurowski

TL;DR

ContourDiff introduces a contour-guided diffusion framework for unpaired medical image translation, enforcing anatomical fidelity by conditioning the denoiser on input-domain contours and adjacent-slice context. The novel Spatially Coherent Guided Diffusion (SCGD) enables slice-to-slice spatial consistency, enabling high-quality CT-to-MRI translations that preserve realistic anatomy and improve downstream segmentation. Zero-shot capability is demonstrated by translating unseen anatomical regions and even different MRI contrasts without retraining, with strong quantitative and qualitative results across lumbar, hip & thigh, and liver datasets. The work offers practical gains for cross-modality training of segmentation models and medical image harmonization, supported by thorough ablations, robustness analyses, and efficiency metrics.

Abstract

Accurately translating medical images between different modalities, such as Computed Tomography (CT) to Magnetic Resonance Imaging (MRI), has numerous downstream clinical and machine learning applications. While several methods have been proposed to achieve this, they often prioritize perceptual quality with respect to output domain features over preserving anatomical fidelity. However, maintaining anatomy during translation is essential for many tasks, e.g., when leveraging masks from the input domain to develop a segmentation model with images translated to the output domain. To address these challenges, we propose ContourDiff with Spatially Coherent Guided Diffusion (SCGD), a novel framework that leverages domain-invariant anatomical contour representations of images. These representations are simple to extract from images, yet form precise spatial constraints on their anatomical content. We introduce a diffusion model that converts contour representations of images from arbitrary input domains into images in the output domain of interest. By applying the contour as a constraint at every diffusion sampling step, we ensure the preservation of anatomical content. We evaluate our method on challenging lumbar spine and hip-and-thigh CT-to-MRI translation tasks, via (1) the performance of segmentation models trained on translated images applied to real MRIs, and (2) the foreground FID and KID of translated images with respect to real MRIs. Our method outperforms other unpaired image translation methods by a significant margin across almost all metrics and scenarios. Moreover, it achieves this without the need to access any input domain information during training and we further verify its zero-shot capability, showing that a model trained on one anatomical region can be directly applied to unseen regions without retraining (GitHub: https://github.com/mazurowski-lab/ContourDiff).

ContourDiff: Unpaired Medical Image Translation with Structural Consistency

TL;DR

ContourDiff introduces a contour-guided diffusion framework for unpaired medical image translation, enforcing anatomical fidelity by conditioning the denoiser on input-domain contours and adjacent-slice context. The novel Spatially Coherent Guided Diffusion (SCGD) enables slice-to-slice spatial consistency, enabling high-quality CT-to-MRI translations that preserve realistic anatomy and improve downstream segmentation. Zero-shot capability is demonstrated by translating unseen anatomical regions and even different MRI contrasts without retraining, with strong quantitative and qualitative results across lumbar, hip & thigh, and liver datasets. The work offers practical gains for cross-modality training of segmentation models and medical image harmonization, supported by thorough ablations, robustness analyses, and efficiency metrics.

Abstract

Accurately translating medical images between different modalities, such as Computed Tomography (CT) to Magnetic Resonance Imaging (MRI), has numerous downstream clinical and machine learning applications. While several methods have been proposed to achieve this, they often prioritize perceptual quality with respect to output domain features over preserving anatomical fidelity. However, maintaining anatomy during translation is essential for many tasks, e.g., when leveraging masks from the input domain to develop a segmentation model with images translated to the output domain. To address these challenges, we propose ContourDiff with Spatially Coherent Guided Diffusion (SCGD), a novel framework that leverages domain-invariant anatomical contour representations of images. These representations are simple to extract from images, yet form precise spatial constraints on their anatomical content. We introduce a diffusion model that converts contour representations of images from arbitrary input domains into images in the output domain of interest. By applying the contour as a constraint at every diffusion sampling step, we ensure the preservation of anatomical content. We evaluate our method on challenging lumbar spine and hip-and-thigh CT-to-MRI translation tasks, via (1) the performance of segmentation models trained on translated images applied to real MRIs, and (2) the foreground FID and KID of translated images with respect to real MRIs. Our method outperforms other unpaired image translation methods by a significant margin across almost all metrics and scenarios. Moreover, it achieves this without the need to access any input domain information during training and we further verify its zero-shot capability, showing that a model trained on one anatomical region can be directly applied to unseen regions without retraining (GitHub: https://github.com/mazurowski-lab/ContourDiff).
Paper Structure (40 sections, 9 equations, 17 figures, 8 tables, 2 algorithms)

This paper contains 40 sections, 9 equations, 17 figures, 8 tables, 2 algorithms.

Figures (17)

  • Figure 1: Structural biases between CT and MRI modalities in certain anatomical regions: minor for the abdominal region from axial view (a), but severe for the leg from axial view and spinal regions from sagittal view (b).
  • Figure 2: Overview of ContourDiff. Top is the training process of ContourDiff. The denoising model $\epsilon_{\theta}$ is trained on output domain images, conditioning on their anatomical contours and on an adjacent slice with probability $P_{adj}$. Bottom is the inference process of ContourDiff. The model generates input domain images in the appearance of the output domain given input domain contours and previously generated adjacent slices.
  • Figure 3: Spatially Coherent Guided Diffusion (SCGD). For each input domain volume, SCGD first translates the initial slice by generating $n$ candidates with setting $C_{adj}$ to an empty map and selecting the optimal one according to a specified criterion (e.g., lowest mean intensity). Then, every subsequent slice is synthesied by conditioning on its anatomical contours and the previously translated slice. Input domain slices are bordered in blue and output domain slices are bordered in orange.
  • Figure 4: Qualitative comparison of ContourDiff and baseline methods. ContourDiff appears to best maintain anatomical consistency during translation for both Lumbar and Hip $\&$ Thigh areas. The input-domain segmentation masks are depicted in blue to visualize the alignment. Unpaired MRIs are included as target-domain examples for reference only. Note: they are no used as ground truth for the translation.
  • Figure 5: Qualitative comparison between unconditional DDPM and ContourDiff. Unconditional DDPM seems to hardly follow input-domain anatomical structures during translation.
  • ...and 12 more figures