Table of Contents
Fetching ...

Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration

Xinlong Cheng, Tiantian Cao, Guoan Cheng, Bangxuan Huang, Xinghan Tian, Ye Wang, Xiaoyu He, Weixin Li, Tianfan Xue, Xuan Dong

TL;DR

This paper identifies a fundamental source of distortions in denoising diffusion model–based image restoration: misalignment between training inputs and testing dynamics, which causes cumulative errors to propagate during iterative refinement. It introduces data-consistent training, aligning training inputs with backward test-time processing to directly minimize the cumulative error, and offers an efficient biased variant to reduce memory demands. Using the ResShift backbone, the approach delivers state-of-the-art results across five restoration tasks (SISR, denoising, deraining, dehazing, and dual-camera SR) while preserving input fidelity and minimizing shape/color distortions. The method provides a general training strategy for diffusion-based restoration, with broad practical impact for high-fidelity image restoration pipelines such as in-camera ISPs and image editing workflows.

Abstract

In this work, we address the limitations of denoising diffusion models (DDMs) in image restoration tasks, particularly the shape and color distortions that can compromise image quality. While DDMs have demonstrated a promising performance in many applications such as text-to-image synthesis, their effectiveness in image restoration is often hindered by shape and color distortions. We observe that these issues arise from inconsistencies between the training and testing data used by DDMs. Based on our observation, we propose a novel training method, named data-consistent training, which allows the DDMs to access images with accumulated errors during training, thereby ensuring the model to learn to correct these errors. Experimental results show that, across five image restoration tasks, our method has significant improvements over state-of-the-art methods while effectively minimizing distortions and preserving image fidelity.

Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration

TL;DR

This paper identifies a fundamental source of distortions in denoising diffusion model–based image restoration: misalignment between training inputs and testing dynamics, which causes cumulative errors to propagate during iterative refinement. It introduces data-consistent training, aligning training inputs with backward test-time processing to directly minimize the cumulative error, and offers an efficient biased variant to reduce memory demands. Using the ResShift backbone, the approach delivers state-of-the-art results across five restoration tasks (SISR, denoising, deraining, dehazing, and dual-camera SR) while preserving input fidelity and minimizing shape/color distortions. The method provides a general training strategy for diffusion-based restoration, with broad practical impact for high-fidelity image restoration pipelines such as in-camera ISPs and image editing workflows.

Abstract

In this work, we address the limitations of denoising diffusion models (DDMs) in image restoration tasks, particularly the shape and color distortions that can compromise image quality. While DDMs have demonstrated a promising performance in many applications such as text-to-image synthesis, their effectiveness in image restoration is often hindered by shape and color distortions. We observe that these issues arise from inconsistencies between the training and testing data used by DDMs. Based on our observation, we propose a novel training method, named data-consistent training, which allows the DDMs to access images with accumulated errors during training, thereby ensuring the model to learn to correct these errors. Experimental results show that, across five image restoration tasks, our method has significant improvements over state-of-the-art methods while effectively minimizing distortions and preserving image fidelity.

Paper Structure

This paper contains 14 sections, 11 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: PSNR (dB) values on five tasks.
  • Figure 2: Modular and cumulative errors.
  • Figure 4: Pipelines of the training methods of traditional DDM vs. our consistent diffusion. At iteration $t$ of the training stage, traditional DDM employs a one-step forward process to obtain the input data ${\bf{x}}_t ^{\rm{train}}$, i.e. ${\bf{x}}_t ^{\rm{train}}={\bf{x}}_t^{{\rm{forw}}} \sim{q({{\bf{x}}_t}|{{\bf{x}}_0})}$, where $q$ is the distortion addition operation. The training loss, which measures the quality of $f_{\theta} ({\bf{x}}_t^{{\rm{forw}}})$, is optimizing the modular error ${\xi}_{t}^{\rm{mod}}$ of the core network while failing to take the input cumulative error into consideration. $f_{\theta}$ is the processing of the core network with the parameter $\theta$. In contrast, our consistent diffusion proposes the data-consistent training. It utilizes the multi-step backward process, which is consistent with that in the testing stage, to generate the input data ${\bf{x}}_t ^{\rm{train}}$ at iteration $t$, i.e. ${\bf{x}}_t ^{\rm{train}}={\bf{x}}_t^{{\rm{back}}} \sim{ {p_\theta }({{\bf{x}}_T})\prod\limits_{i= T}^{t+1} {{p_\theta }({{\bf{x}}_{i - 1}}|{{\bf{x}}_i})}}$. ${p_\theta }$ denotes the denoising process parameterized by $\theta$. The training loss, which measures the quality of $f_{\theta} ({\bf{x}}_t^{{\rm{back}}})$, is directly optimizing the cumulative error $\hat{{\xi}}_{t}^{\rm{cumu}}$ of the core network.
  • Figure 5: $\xi_{t}^{mod}$ and $\xi_{t}^{cumu}$ of traditional DDM on five restoration tasks.
  • Figure 6: $\hat{\xi}_{t}^{cumu}$ of our consistent diffusion on five restoration tasks.
  • ...and 12 more figures