Taming Diffusion Models for Image Restoration: A Review

Ziwei Luo; Fredrik K. Gustafsson; Zheng Zhao; Jens Sjölund; Thomas B. Schön

Taming Diffusion Models for Image Restoration: A Review

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, Thomas B. Schön

TL;DR

This review surveys diffusion-model frameworks for image restoration, outlining how forward diffusion, score-based SDEs, and conditional diffusion enable robust HQ recovery from degraded inputs. It categorizes IR approaches into conditional direct diffusion, training-free conditioning, and diffusion processes toward degraded images (IR-SDE and diffusion bridges), and discusses their trade-offs in fidelity, consistency, and efficiency. Key contributions include connecting DDPMs to VP-SDEs, detailing conditional guidance with score-based conditioning, and outlining practical restoration pipelines (e.g., SR3, Palette, StableSR) along with data-consistency strategies like DPS and diffusion-bridge methods. The review highlights challenges such as OOD degradations, texture consistency, and computational cost, and points to future directions in more efficient sampling, flow-based or optimal-transport formulations, and language-guided IR to improve robustness and realism in real-world conditions.

Abstract

Diffusion models have achieved remarkable progress in generative modelling, particularly in enhancing image quality to conform to human preferences. Recently, these models have also been applied to low-level computer vision for photo-realistic image restoration (IR) in tasks such as image denoising, deblurring, dehazing, etc. In this review paper, we introduce key constructions in diffusion models and survey contemporary techniques that make use of diffusion models in solving general IR tasks. Furthermore, we point out the main challenges and limitations of existing diffusion-based IR frameworks and provide potential directions for future work.

Taming Diffusion Models for Image Restoration: A Review

TL;DR

Abstract

Paper Structure (21 sections, 62 equations, 6 figures)

This paper contains 21 sections, 62 equations, 6 figures.

Introduction
Generative Modeling with Diffusion Models
Denoising Diffusion Probabilistic Models (DDPMs)
Forward diffusion process
Reverse process
Training objective
Simplified objective
Data Perturbation and Sampling with SDEs
Data perturbation with forward SDEs
Sampling with reverse-time SDEs
Denoising score matching
Sampling with Langevin dynamics
Interpreting DDPM with the variance preserving SDE
Conditional Diffusion Models
Conditional SDE
...and 6 more sections

Figures (6)

Figure 1: Denoising diffusion probabilistic models (DDPMs). The forward path transfers data to Gaussian noise, and the reverse path learns to generate data from noise along the actual time reversal of the forward process. Here, the reverse transition $p_{\theta}(x_{t-1} \mid x_t)$ represents the model we aim to learn, and the conditional posterior $q(x_{t-1} \mid x_t, x_0)$ is a tractable Gaussian which serves as the target distribution the model wants to match as the $L_{t-1}$ term in Eq. \ref{['eq:diffusion_training_loss']}.
Figure 2: Data perturbation and sampling with SDEs. Different from DDPMs, Score-SDE continuously perturbs the data to Gaussian noise using a forward SDE, $\mathop{}\!\mathrm{d} {x} = f(x, t) \mathop{}\!\mathrm{d} t + g(t)\mathop{}\!\mathrm{d} w$, and then generates new samples by estimating the score $\nabla_{{x}} \log p_t({x})$ and simulating the corresponding reverse-time SDE.
Figure 3: Left: Overview of the conditional direct diffusion model (CDDM) on the face inpainting case. The only change compared to DDPM (Figure \ref{['fig:ddpm']}) is the reverse transition model $p_{\theta}(x_{t-1} \mid x_t, \, y)$, which involves the LQ image $y$ in sampling to generate the corresponding HQ image. Right: Two image restoration examples (image super-resolution and inpainting) performed under the CDDM framework. These results look realistic but are not consistent with the original image.
Figure 4: Overview of the projection-based CDM. There are two paths for the HQ image $x$ and LQ image $y$, generated from the same diffusion model. At each reverse step $t$, the sampling first leverages the pretrained DDPM for unconditional generation, i.e. $p_\theta (\hat{x}_t \mid x_{t+1})$, and then refines $\hat{x}_t$ to $x_t$ with functions $H$ and $b$ as $x_t = H(\hat{x}_t) + b(y_t)$, where $y_t$ is obtained by applying the forward marginal transition Eq. \ref{['eq:diffusion_marginalize_kernel']} on the LQ image as $y_t \sim q(y_t \mid y)$.
Figure 5: Overview of the approach that performs diffusion towards degraded images. Here, the LQ image $y$ is involved in both the forward and backward processes. Moreover, the terminal state $x_T$ is often a (noisy) LQ image rather than the Gaussian noise.
...and 1 more figures

Theorems & Definitions (3)

proof
proof
proof

Taming Diffusion Models for Image Restoration: A Review

TL;DR

Abstract

Taming Diffusion Models for Image Restoration: A Review

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (3)