Table of Contents
Fetching ...

DiffDenoise: Self-Supervised Medical Image Denoising with Conditional Diffusion Models

Basar Demir, Yikang Liu, Xiao Chen, Eric Z. Chen, Lin Zhao, Boris Mailhe, Terrence Chen, Shanhui Sun

TL;DR

DiffDenoise tackles the challenge of preserving high-frequency details in self-supervised medical image denoising by tying a diffusion model to strong conditioning from a Blind-Spot Network, applying Stabilized Reverse Diffusion Sampling to produce clean outputs, and using knowledge distillation to train a fast, deterministic denoiser. The method leverages conditional diffusion with symmetric-noise averaging to stabilize sampling and recover fine structures without requiring clean training pairs. Across both synthetic and real medical imaging datasets, DiffDenoise outperforms state-of-the-art self-supervised baselines in PSNR and SSIM, while maintaining practical inference speed via KD. The approach generalizes across imaging modalities and anatomical structures, offering a scalable route to high-quality medical image denoising in clinical settings.

Abstract

Many self-supervised denoising approaches have been proposed in recent years. However, these methods tend to overly smooth images, resulting in the loss of fine structures that are essential for medical applications. In this paper, we propose DiffDenoise, a powerful self-supervised denoising approach tailored for medical images, designed to preserve high-frequency details. Our approach comprises three stages. First, we train a diffusion model on noisy images, using the outputs of a pretrained Blind-Spot Network as conditioning inputs. Next, we introduce a novel stabilized reverse sampling technique, which generates clean images by averaging diffusion sampling outputs initialized with a pair of symmetric noises. Finally, we train a supervised denoising network using noisy images paired with the denoised outputs generated by the diffusion model. Our results demonstrate that DiffDenoise outperforms existing state-of-the-art methods in both synthetic and real-world medical image denoising tasks. We provide both a theoretical foundation and practical insights, demonstrating the method's effectiveness across various medical imaging modalities and anatomical structures.

DiffDenoise: Self-Supervised Medical Image Denoising with Conditional Diffusion Models

TL;DR

DiffDenoise tackles the challenge of preserving high-frequency details in self-supervised medical image denoising by tying a diffusion model to strong conditioning from a Blind-Spot Network, applying Stabilized Reverse Diffusion Sampling to produce clean outputs, and using knowledge distillation to train a fast, deterministic denoiser. The method leverages conditional diffusion with symmetric-noise averaging to stabilize sampling and recover fine structures without requiring clean training pairs. Across both synthetic and real medical imaging datasets, DiffDenoise outperforms state-of-the-art self-supervised baselines in PSNR and SSIM, while maintaining practical inference speed via KD. The approach generalizes across imaging modalities and anatomical structures, offering a scalable route to high-quality medical image denoising in clinical settings.

Abstract

Many self-supervised denoising approaches have been proposed in recent years. However, these methods tend to overly smooth images, resulting in the loss of fine structures that are essential for medical applications. In this paper, we propose DiffDenoise, a powerful self-supervised denoising approach tailored for medical images, designed to preserve high-frequency details. Our approach comprises three stages. First, we train a diffusion model on noisy images, using the outputs of a pretrained Blind-Spot Network as conditioning inputs. Next, we introduce a novel stabilized reverse sampling technique, which generates clean images by averaging diffusion sampling outputs initialized with a pair of symmetric noises. Finally, we train a supervised denoising network using noisy images paired with the denoised outputs generated by the diffusion model. Our results demonstrate that DiffDenoise outperforms existing state-of-the-art methods in both synthetic and real-world medical image denoising tasks. We provide both a theoretical foundation and practical insights, demonstrating the method's effectiveness across various medical imaging modalities and anatomical structures.

Paper Structure

This paper contains 19 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: DiffDenoise, a self-supervised denoising method, involves training a diffusion model on noisy images with strong conditioning, sampling clean images with stabilized reverse diffusion sampling, and knowledge distillation to suppress hallucination and accelerate inference.
  • Figure 2: Overview of our method. Our self-supervised denoising method consists of three phases. First, we train a diffusion model on noisy images, using the outputs of a pretrained Blind-Spot Network as the conditioning input. Then, during inference, we apply our novel stabilized reverse sampling technique that generates clean images by averaging diffusion sampling outputs initialized with a pair of symmetric noises. Finally, we train a supervised denoising network using noisy images paired with the denoised outputs generated by the diffusion model.
  • Figure 3: Denoising results. Visual comparisons with baseline methods on datasets with synthetic noise: (a) and (b) show pixel-wise independent denoising results for knee MRI and chest X-rays, respectively; (c) and (d) illustrate denoising for spatially correlated Gaussian noise; (e) presents denoising results after three iterations, along with a difference map highlighting changes between the first and third iterations. Overall, our method produces the sharpest outputs and visibly preserves fine structures better than the comparison methods.
  • Figure 4: Demonstration of stabilized reverse diffusion sampling (SRDS) and knowledge distillation (KD) on one example image.$\epsilon_{1}$ and $\epsilon_{2}$ are two random noises. Without SRDS, inference with $\epsilon_{1}$ results in a lower PSNR dB compared to inference with $\epsilon_{2}$. In contrast, SRDS enables convergence to similar PSNR dBs. The last row plots 50 random inferences with and without SRDS, as well as the average of two sampled with random noises (rather than symmetric noises in SRDS). This demonstrates that SRDS provides more stable sampling (a narrower and higher PSNR range) than both single inferences and averaging two random inferences. KD further stabilizes the inference as indicated by the red line.