Table of Contents
Fetching ...

BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution

Feng Li, Yixuan Wu, Zichao Liang, Runmin Cong, Huihui Bai, Yao Zhao, Meng Wang

TL;DR

BlindDiff addresses the challenge of blind single-image super-resolution by unifying MAP-based degradation modeling with diffusion probabilistic models. It unfolds the MAP optimization into the DDPM reverse process, enabling alternating updates of the blur kernel $\mathbf{k}$ and the high-resolution image $\mathbf{x}$ guided by a kernel-aware gradient term. Central to the approach is the Modulated Conditional Transformer (MCFormer), which jointly provides data priors and kernel priors by training with noise and kernel constraints, and a kernel-modulation mechanism to learn degradation-aware features. Empirical results on synthetic and real-world data demonstrate state-of-the-art performance with significantly reduced model complexity compared to existing diffusion-model-based methods, across isotropic and anisotropic degradations as well as real degradations. This framework offers a physically grounded, scalable path for robust blind SR in practical applications.

Abstract

Diffusion models (DM) have achieved remarkable promise in image super-resolution (SR). However, most of them are tailored to solving non-blind inverse problems with fixed known degradation settings, limiting their adaptability to real-world applications that involve complex unknown degradations. In this work, we propose BlindDiff, a DM-based blind SR method to tackle the blind degradation settings in SISR. BlindDiff seamlessly integrates the MAP-based optimization into DMs, which constructs a joint distribution of the low-resolution (LR) observation, high-resolution (HR) data, and degradation kernels for the data and kernel priors, and solves the blind SR problem by unfolding MAP approach along with the reverse process. Unlike most DMs, BlindDiff firstly presents a modulated conditional transformer (MCFormer) that is pre-trained with noise and kernel constraints, further serving as a posterior sampler to provide both priors simultaneously. Then, we plug a simple yet effective kernel-aware gradient term between adjacent sampling iterations that guides the diffusion model to learn degradation consistency knowledge. This also enables to joint refine the degradation model as well as HR images by observing the previous denoised sample. With the MAP-based reverse diffusion process, we show that BlindDiff advocates alternate optimization for blur kernel estimation and HR image restoration in a mutual reinforcing manner. Experiments on both synthetic and real-world datasets show that BlindDiff achieves the state-of-the-art performance with significant model complexity reduction compared to recent DM-based methods. Code will be available at \url{https://github.com/lifengcs/BlindDiff}

BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution

TL;DR

BlindDiff addresses the challenge of blind single-image super-resolution by unifying MAP-based degradation modeling with diffusion probabilistic models. It unfolds the MAP optimization into the DDPM reverse process, enabling alternating updates of the blur kernel and the high-resolution image guided by a kernel-aware gradient term. Central to the approach is the Modulated Conditional Transformer (MCFormer), which jointly provides data priors and kernel priors by training with noise and kernel constraints, and a kernel-modulation mechanism to learn degradation-aware features. Empirical results on synthetic and real-world data demonstrate state-of-the-art performance with significantly reduced model complexity compared to existing diffusion-model-based methods, across isotropic and anisotropic degradations as well as real degradations. This framework offers a physically grounded, scalable path for robust blind SR in practical applications.

Abstract

Diffusion models (DM) have achieved remarkable promise in image super-resolution (SR). However, most of them are tailored to solving non-blind inverse problems with fixed known degradation settings, limiting their adaptability to real-world applications that involve complex unknown degradations. In this work, we propose BlindDiff, a DM-based blind SR method to tackle the blind degradation settings in SISR. BlindDiff seamlessly integrates the MAP-based optimization into DMs, which constructs a joint distribution of the low-resolution (LR) observation, high-resolution (HR) data, and degradation kernels for the data and kernel priors, and solves the blind SR problem by unfolding MAP approach along with the reverse process. Unlike most DMs, BlindDiff firstly presents a modulated conditional transformer (MCFormer) that is pre-trained with noise and kernel constraints, further serving as a posterior sampler to provide both priors simultaneously. Then, we plug a simple yet effective kernel-aware gradient term between adjacent sampling iterations that guides the diffusion model to learn degradation consistency knowledge. This also enables to joint refine the degradation model as well as HR images by observing the previous denoised sample. With the MAP-based reverse diffusion process, we show that BlindDiff advocates alternate optimization for blur kernel estimation and HR image restoration in a mutual reinforcing manner. Experiments on both synthetic and real-world datasets show that BlindDiff achieves the state-of-the-art performance with significant model complexity reduction compared to recent DM-based methods. Code will be available at \url{https://github.com/lifengcs/BlindDiff}
Paper Structure (14 sections, 14 equations, 8 figures, 5 tables, 2 algorithms)

This paper contains 14 sections, 14 equations, 8 figures, 5 tables, 2 algorithms.

Figures (8)

  • Figure 1: Our BlindDiff achieves the state-of-the-art performance for $4\times$ blind SR (Table \ref{['tab2']}) which being more computational efficient than existing DM-based methods. The Multi-Adds are computed based on the LR size of $128\times 128$.
  • Figure 2: Overview of the forward diffusion process (left$\rightarrow$right) that gradually adds Gaussian noise to the original clean image $\mathbf{x}_0$. The diffusion model is trained by $f_{\theta}(\mathbf{x}_t,t,\mathbf{y})$ to provide bot kernel and image priors. In the reverse process, for each timestep $t$, we first sample $\tilde{\mathbf{x}}_{t-1}$ from $\mathbf{x}_t$ and then minimize the residual $\Vert \mathbf{y}-(\mathbf{k}_t\otimes \tilde{\mathbf{x}}_0(\mathbf{x}_t))\downarrow_s\Vert^2_2$ with denoised $\tilde{\mathbf{x}}_{t-1}$, leading to $\mathbf{x}_{t-1}$, where the gradient $\nabla_{\mathbf{x}_t}$ is backpropagated through the whole network.
  • Figure 3: Network Architecture of the proposed modulated conditional transformer (MCFormer) in BlindDiff. MCFormer consists of a kernel estimator to predict blur kernels and serial transformer blocks in a multi-scale hierarchical design.
  • Figure 4: Visual comparisons of $4\times$ blind SR methods on isotropic Gaussian kernels.
  • Figure 5: Visual comparisons of $4\times$ blind SR methods on anisotropic Gaussian kernels.
  • ...and 3 more figures