CoreDiff: Contextual Error-Modulated Generalized Diffusion Model for Low-Dose CT Denoising and Generalization
Qi Gao, Zilong Li, Junping Zhang, Yi Zhang, Hongming Shan
TL;DR
CoreDiff introduces a mean-preserving degradation operator $x_t = ed{\alpha_t x_0 + (1 - \alpha_t) x_T}$ with LDCT end-state $x_T$, enabling fast, informative sampling for LDCT denoising. A Contextual Error-modulated Restoration Network (CLEAR-Net) uses adjacent-slice context and an error-modulated module to constrain the sampling trajectory and align time-step embeddings, mitigating accumulated and misalignment errors. A one-shot learning framework learns weights $w_t$ with $\sum_t w_t = 1$ to synthesize an optimal output from multiple intermediate denoised images, allowing rapid generalization to new dose levels with minimal data. Across Mayo 2016/2020, Piglet, and Phantom datasets, CoreDiff achieves state-of-the-art denoising performance with clinically acceptable inference speed, demonstrating strong cross-domain robustness and practical potential for LDCT post-processing.
Abstract
Low-dose computed tomography (CT) images suffer from noise and artifacts due to photon starvation and electronic noise. Recently, some works have attempted to use diffusion models to address the over-smoothness and training instability encountered by previous deep-learning-based denoising models. However, diffusion models suffer from long inference times due to the large number of sampling steps involved. Very recently, cold diffusion model generalizes classical diffusion models and has greater flexibility. Inspired by the cold diffusion, this paper presents a novel COntextual eRror-modulated gEneralized Diffusion model for low-dose CT (LDCT) denoising, termed CoreDiff. First, CoreDiff utilizes LDCT images to displace the random Gaussian noise and employs a novel mean-preserving degradation operator to mimic the physical process of CT degradation, significantly reducing sampling steps thanks to the informative LDCT images as the starting point of the sampling process. Second, to alleviate the error accumulation problem caused by the imperfect restoration operator in the sampling process, we propose a novel ContextuaL Error-modulAted Restoration Network (CLEAR-Net), which can leverage contextual information to constrain the sampling process from structural distortion and modulate time step embedding features for better alignment with the input at the next time step. Third, to rapidly generalize to a new, unseen dose level with as few resources as possible, we devise a one-shot learning framework to make CoreDiff generalize faster and better using only a single LDCT image (un)paired with NDCT. Extensive experimental results on two datasets demonstrate that our CoreDiff outperforms competing methods in denoising and generalization performance, with a clinically acceptable inference time. Source code is made available at https://github.com/qgao21/CoreDiff.
