Table of Contents
Fetching ...

CoreDiff: Contextual Error-Modulated Generalized Diffusion Model for Low-Dose CT Denoising and Generalization

Qi Gao, Zilong Li, Junping Zhang, Yi Zhang, Hongming Shan

TL;DR

CoreDiff introduces a mean-preserving degradation operator $x_t = ed{\alpha_t x_0 + (1 - \alpha_t) x_T}$ with LDCT end-state $x_T$, enabling fast, informative sampling for LDCT denoising. A Contextual Error-modulated Restoration Network (CLEAR-Net) uses adjacent-slice context and an error-modulated module to constrain the sampling trajectory and align time-step embeddings, mitigating accumulated and misalignment errors. A one-shot learning framework learns weights $w_t$ with $\sum_t w_t = 1$ to synthesize an optimal output from multiple intermediate denoised images, allowing rapid generalization to new dose levels with minimal data. Across Mayo 2016/2020, Piglet, and Phantom datasets, CoreDiff achieves state-of-the-art denoising performance with clinically acceptable inference speed, demonstrating strong cross-domain robustness and practical potential for LDCT post-processing.

Abstract

Low-dose computed tomography (CT) images suffer from noise and artifacts due to photon starvation and electronic noise. Recently, some works have attempted to use diffusion models to address the over-smoothness and training instability encountered by previous deep-learning-based denoising models. However, diffusion models suffer from long inference times due to the large number of sampling steps involved. Very recently, cold diffusion model generalizes classical diffusion models and has greater flexibility. Inspired by the cold diffusion, this paper presents a novel COntextual eRror-modulated gEneralized Diffusion model for low-dose CT (LDCT) denoising, termed CoreDiff. First, CoreDiff utilizes LDCT images to displace the random Gaussian noise and employs a novel mean-preserving degradation operator to mimic the physical process of CT degradation, significantly reducing sampling steps thanks to the informative LDCT images as the starting point of the sampling process. Second, to alleviate the error accumulation problem caused by the imperfect restoration operator in the sampling process, we propose a novel ContextuaL Error-modulAted Restoration Network (CLEAR-Net), which can leverage contextual information to constrain the sampling process from structural distortion and modulate time step embedding features for better alignment with the input at the next time step. Third, to rapidly generalize to a new, unseen dose level with as few resources as possible, we devise a one-shot learning framework to make CoreDiff generalize faster and better using only a single LDCT image (un)paired with NDCT. Extensive experimental results on two datasets demonstrate that our CoreDiff outperforms competing methods in denoising and generalization performance, with a clinically acceptable inference time. Source code is made available at https://github.com/qgao21/CoreDiff.

CoreDiff: Contextual Error-Modulated Generalized Diffusion Model for Low-Dose CT Denoising and Generalization

TL;DR

CoreDiff introduces a mean-preserving degradation operator with LDCT end-state , enabling fast, informative sampling for LDCT denoising. A Contextual Error-modulated Restoration Network (CLEAR-Net) uses adjacent-slice context and an error-modulated module to constrain the sampling trajectory and align time-step embeddings, mitigating accumulated and misalignment errors. A one-shot learning framework learns weights with to synthesize an optimal output from multiple intermediate denoised images, allowing rapid generalization to new dose levels with minimal data. Across Mayo 2016/2020, Piglet, and Phantom datasets, CoreDiff achieves state-of-the-art denoising performance with clinically acceptable inference speed, demonstrating strong cross-domain robustness and practical potential for LDCT post-processing.

Abstract

Low-dose computed tomography (CT) images suffer from noise and artifacts due to photon starvation and electronic noise. Recently, some works have attempted to use diffusion models to address the over-smoothness and training instability encountered by previous deep-learning-based denoising models. However, diffusion models suffer from long inference times due to the large number of sampling steps involved. Very recently, cold diffusion model generalizes classical diffusion models and has greater flexibility. Inspired by the cold diffusion, this paper presents a novel COntextual eRror-modulated gEneralized Diffusion model for low-dose CT (LDCT) denoising, termed CoreDiff. First, CoreDiff utilizes LDCT images to displace the random Gaussian noise and employs a novel mean-preserving degradation operator to mimic the physical process of CT degradation, significantly reducing sampling steps thanks to the informative LDCT images as the starting point of the sampling process. Second, to alleviate the error accumulation problem caused by the imperfect restoration operator in the sampling process, we propose a novel ContextuaL Error-modulAted Restoration Network (CLEAR-Net), which can leverage contextual information to constrain the sampling process from structural distortion and modulate time step embedding features for better alignment with the input at the next time step. Third, to rapidly generalize to a new, unseen dose level with as few resources as possible, we devise a one-shot learning framework to make CoreDiff generalize faster and better using only a single LDCT image (un)paired with NDCT. Extensive experimental results on two datasets demonstrate that our CoreDiff outperforms competing methods in denoising and generalization performance, with a clinically acceptable inference time. Source code is made available at https://github.com/qgao21/CoreDiff.
Paper Structure (30 sections, 11 equations, 15 figures, 7 tables, 2 algorithms)

This paper contains 30 sections, 11 equations, 15 figures, 7 tables, 2 algorithms.

Figures (15)

  • Figure 1: Overview of the proposed CoreDiff for low-dose CT denoising. The introduced generalized diffusion model leverages a novel degradation operator to mimic the physical process of CT image degradation during the diffusion process. The proposed CLEAR-Net can alleviate the accumulated error and is trained in a two-stage manner for each time step; one key feature of CLEAR-Net is the error-modulated module (EMM) that can calibrate the time step embedding feature with the latest prediction and the given input LDCT image.
  • Figure 2: Comparison of (a) the degradation operator in Eq. \ref{['eq:commonly_used_degradation_operator']} and (b) the proposed one in Eq. \ref{['eq:proposed_degradation_operator']}. The proposed operator achieves a mean-preserving process, simulating the physical process of CT degradation.
  • Figure 3: Framework of one-shot learning for rapid generalization.
  • Figure 4: Qualitative results of a 25% dose abdomen CT image from Mayo 2016 dataset. (a) NDCT image (Ground truth), (b) FBP, (c) PWLS, (d) RED-CNN, (e) PDF-RED-CNN, (f) WGAN-VGG, (g) CNCL-U-Net, (h) DU-GAN, (i) DDM$^2$, (j) IDDPM-1000, (k) IDDPM-50, (l) IDDPM-10, and (m) CoreDiff-10 (ours). The display window is [-100, 200] HU. The red ROI is zoomed in for visual comparison and the orange arrow points to one lesion.
  • Figure 5: Qualitative results of a 5% dose abdomen CT image from Mayo 2016 dataset. (a) NDCT image (Ground truth), (b) FBP, (c) PWLS, (d) RED-CNN, (e) PDF-RED-CNN, (f) WGAN-VGG, (g) CNCL-U-Net, (h) DU-GAN, (i) DDM$^2$, (j) IDDPM-1000, (k) IDDPM-50, (l) IDDPM-10, and (m) CoreDiff-10 (ours). The display window is [-100, 200] HU. The red ROI is zoomed in for visual comparison and the orange arrow points to one lesion.
  • ...and 10 more figures