Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration
Xinlong Cheng, Tiantian Cao, Guoan Cheng, Bangxuan Huang, Xinghan Tian, Ye Wang, Xiaoyu He, Weixin Li, Tianfan Xue, Xuan Dong
TL;DR
This paper identifies a fundamental source of distortions in denoising diffusion model–based image restoration: misalignment between training inputs and testing dynamics, which causes cumulative errors to propagate during iterative refinement. It introduces data-consistent training, aligning training inputs with backward test-time processing to directly minimize the cumulative error, and offers an efficient biased variant to reduce memory demands. Using the ResShift backbone, the approach delivers state-of-the-art results across five restoration tasks (SISR, denoising, deraining, dehazing, and dual-camera SR) while preserving input fidelity and minimizing shape/color distortions. The method provides a general training strategy for diffusion-based restoration, with broad practical impact for high-fidelity image restoration pipelines such as in-camera ISPs and image editing workflows.
Abstract
In this work, we address the limitations of denoising diffusion models (DDMs) in image restoration tasks, particularly the shape and color distortions that can compromise image quality. While DDMs have demonstrated a promising performance in many applications such as text-to-image synthesis, their effectiveness in image restoration is often hindered by shape and color distortions. We observe that these issues arise from inconsistencies between the training and testing data used by DDMs. Based on our observation, we propose a novel training method, named data-consistent training, which allows the DDMs to access images with accumulated errors during training, thereby ensuring the model to learn to correct these errors. Experimental results show that, across five image restoration tasks, our method has significant improvements over state-of-the-art methods while effectively minimizing distortions and preserving image fidelity.
