Table of Contents
Fetching ...

WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis

Paul Friedrich, Julia Wolleb, Florentin Bieder, Alicia Durrer, Philippe C. Cattin

TL;DR

WDM is presented, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images to high resolutions and is the only one capable of generating high-quality images at a resolution of $256 \times 256 \times 256$, outperforming all comparing methods.

Abstract

Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model's applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling 3D diffusion models to high resolutions and can be trained on a single \SI{40}{\giga\byte} GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 \times 128 \times 128$ demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to recent GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of generating high-quality images at a resolution of $256 \times 256 \times 256$, outperforming all comparing methods.

WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis

TL;DR

WDM is presented, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images to high resolutions and is the only one capable of generating high-quality images at a resolution of , outperforming all comparing methods.

Abstract

Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model's applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling 3D diffusion models to high resolutions and can be trained on a single \SI{40}{\giga\byte} GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to recent GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of generating high-quality images at a resolution of , outperforming all comparing methods.
Paper Structure (19 sections, 7 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 7 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: Schematic overview of the proposed wavelet-based image synthesis framework. A diffusion model is trained on the wavelet coefficients $x_0$ of the real input data $y_0$. During sampling, starting from random wavelet coefficients $x_T$, $T$ denoising steps are performed iteratively to predict denoised wavelet coefficients $\tilde{x}_0$. The final output images $\tilde{y}_0$ are produced by applying Inverse Discrete Wavelet Transform (IDWT) to the generated wavelet coefficients $\tilde{x}_0$.
  • Figure 2: Qualitative results of our method (WDM) on an unconditional image generation task on BraTS $256 \times 256 \times 256$(left) and $128 \times 128 \times 128$(right).
  • Figure 3: Qualitative results of our method (WDM) on an unconditional image generation task on LIDC-IDRI $256 \times 256 \times 256$(left) and $128 \times 128 \times 128$(right).