Table of Contents
Fetching ...

CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

Qingguo Liu, Chenyi Zhuang, Pan Gao, Jie Qin

TL;DR

CDFormer tackles blind image super-resolution by jointly modeling content and degradation through a Content Degradation Prior ($CDP$). It introduces a two-stage framework: Stage 1 trains a diffusion-based module to learn $CDP$ from paired $I_{HR}$ and $I_{LR}$, while Stage 2 uses $I_{LR}$ alone to estimate $CDP$ via diffusion with a 1D condition vector, feeding this information into a Transformer-based SR network with Content Degradation Injection Modules. The method combines a reconstruction loss with a diffusion-based prior loss, enabling efficient CDP estimation and improved texture reconstruction, especially under complex degradations. Experimental results show state-of-the-art performance on multiple blind-SR benchmarks, with notable improvements in texture quality and robustness compared with diffusion-based SR baselines, while maintaining practical inference efficiency. The work highlights the practical value of using diffusion models as estimators for prior content and degradation representations to guide high-fidelity SR.

Abstract

Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details. In this paper, we propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations. However, low-resolution images cannot provide enough content details, and thus we introduce a diffusion-based module $CDFormer_{diff}$ to first learn Content Degradation Prior (CDP) in both low- and high-resolution images, and then approximate the real distribution given only low-resolution information. Moreover, we apply an adaptive SR network $CDFormer_{SR}$ that effectively utilizes CDP to refine features. Compared to previous diffusion-based SR methods, we treat the diffusion model as an estimator that can overcome the limitations of expensive sampling time and excessive diversity. Experiments show that CDFormer can outperform existing methods, establishing a new state-of-the-art performance on various benchmarks under blind settings. Codes and models will be available at \href{https://github.com/I2-Multimedia-Lab/CDFormer}{https://github.com/I2-Multimedia-Lab/CDFormer}.

CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

TL;DR

CDFormer tackles blind image super-resolution by jointly modeling content and degradation through a Content Degradation Prior (). It introduces a two-stage framework: Stage 1 trains a diffusion-based module to learn from paired and , while Stage 2 uses alone to estimate via diffusion with a 1D condition vector, feeding this information into a Transformer-based SR network with Content Degradation Injection Modules. The method combines a reconstruction loss with a diffusion-based prior loss, enabling efficient CDP estimation and improved texture reconstruction, especially under complex degradations. Experimental results show state-of-the-art performance on multiple blind-SR benchmarks, with notable improvements in texture quality and robustness compared with diffusion-based SR baselines, while maintaining practical inference efficiency. The work highlights the practical value of using diffusion models as estimators for prior content and degradation representations to guide high-fidelity SR.

Abstract

Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details. In this paper, we propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations. However, low-resolution images cannot provide enough content details, and thus we introduce a diffusion-based module to first learn Content Degradation Prior (CDP) in both low- and high-resolution images, and then approximate the real distribution given only low-resolution information. Moreover, we apply an adaptive SR network that effectively utilizes CDP to refine features. Compared to previous diffusion-based SR methods, we treat the diffusion model as an estimator that can overcome the limitations of expensive sampling time and excessive diversity. Experiments show that CDFormer can outperform existing methods, establishing a new state-of-the-art performance on various benchmarks under blind settings. Codes and models will be available at \href{https://github.com/I2-Multimedia-Lab/CDFormer}{https://github.com/I2-Multimedia-Lab/CDFormer}.
Paper Structure (14 sections, 13 equations, 8 figures, 5 tables)

This paper contains 14 sections, 13 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Blind image Super-Resolution for scale 4 on kernel width 1.2. Our proposed CDFormer with CDP is capable of producing sharp and clean textures and outperforms previous state-of-the-art approaches DASR, KDSR, DCLS, and StableSR.
  • Figure 2: Overall architecture of our proposed CDFormer. In the first stage (blue line), we train the GT encoder to learn Content Degradation Prior (CDP) from both HR and LR images to guide the SR network $CDFormer_{SR}$. In the second stage (red line), only LR images are input into LR encoder to produce conditional vectors, which helps the diffusion model to recreate CDP.
  • Figure 3: Details of Content-aware Degradation-driven Refinement Block (CDRB).
  • Figure 4: Visual results of Imgs in Urban100, for scale factor $4$ and kernel width $1.2$. Best marked in red.
  • Figure 5: Visualization of SR results via different DP methods on anisotropic Gaussian kernels and noises.
  • ...and 3 more figures