Table of Contents
Fetching ...

Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model

Leheng Zhang, Weiyi You, Kexuan Shi, Shuhang Gu

TL;DR

This paper tackles real-world single-image super-resolution by reframing diffusion-based SR as an LR-content-aware process. It introduces Uncertainty-guided Noise Weighting (UNW) to apply region-specific noise based on an uncertainty estimate derived from an auxiliary SR network, and couples this with a lighter pixel-space diffusion architecture (PixelUnshuffle + upsampling) and SR conditioning to improve both fidelity and perceptual quality while reducing model size and training overhead. The approach achieves state-of-the-art perceptual performance on synthetic and real-world SR benchmarks, with substantial efficiency gains (e.g., ~30% smaller model and ~167% faster training) and robust qualitative improvements (sharper textures and edges). The work demonstrates the practical viability of region-aware diffusion SR for real-world deployment and provides detailed supplementary material on sampling, weighting, and architecture choices.

Abstract

Diffusion-based image super-resolution methods have demonstrated significant advantages over GAN-based approaches, particularly in terms of perceptual quality. Building upon a lengthy Markov chain, diffusion-based methods possess remarkable modeling capacity, enabling them to achieve outstanding performance in real-world scenarios. Unlike previous methods that focus on modifying the noise schedule or sampling process to enhance performance, our approach emphasizes the improved utilization of LR information. We find that different regions of the LR image can be viewed as corresponding to different timesteps in a diffusion process, where flat areas are closer to the target HR distribution but edge and texture regions are farther away. In these flat areas, applying a slight noise is more advantageous for the reconstruction. We associate this characteristic with uncertainty and propose to apply uncertainty estimate to guide region-specific noise level control, a technique we refer to as Uncertainty-guided Noise Weighting. Pixels with lower uncertainty (i.e., flat regions) receive reduced noise to preserve more LR information, therefore improving performance. Furthermore, we modify the network architecture of previous methods to develop our Uncertainty-guided Perturbation Super-Resolution (UPSR) model. Extensive experimental results demonstrate that, despite reduced model size and training overhead, the proposed UWSR method outperforms current state-of-the-art methods across various datasets, both quantitatively and qualitatively.

Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model

TL;DR

This paper tackles real-world single-image super-resolution by reframing diffusion-based SR as an LR-content-aware process. It introduces Uncertainty-guided Noise Weighting (UNW) to apply region-specific noise based on an uncertainty estimate derived from an auxiliary SR network, and couples this with a lighter pixel-space diffusion architecture (PixelUnshuffle + upsampling) and SR conditioning to improve both fidelity and perceptual quality while reducing model size and training overhead. The approach achieves state-of-the-art perceptual performance on synthetic and real-world SR benchmarks, with substantial efficiency gains (e.g., ~30% smaller model and ~167% faster training) and robust qualitative improvements (sharper textures and edges). The work demonstrates the practical viability of region-aware diffusion SR for real-world deployment and provides detailed supplementary material on sampling, weighting, and architecture choices.

Abstract

Diffusion-based image super-resolution methods have demonstrated significant advantages over GAN-based approaches, particularly in terms of perceptual quality. Building upon a lengthy Markov chain, diffusion-based methods possess remarkable modeling capacity, enabling them to achieve outstanding performance in real-world scenarios. Unlike previous methods that focus on modifying the noise schedule or sampling process to enhance performance, our approach emphasizes the improved utilization of LR information. We find that different regions of the LR image can be viewed as corresponding to different timesteps in a diffusion process, where flat areas are closer to the target HR distribution but edge and texture regions are farther away. In these flat areas, applying a slight noise is more advantageous for the reconstruction. We associate this characteristic with uncertainty and propose to apply uncertainty estimate to guide region-specific noise level control, a technique we refer to as Uncertainty-guided Noise Weighting. Pixels with lower uncertainty (i.e., flat regions) receive reduced noise to preserve more LR information, therefore improving performance. Furthermore, we modify the network architecture of previous methods to develop our Uncertainty-guided Perturbation Super-Resolution (UPSR) model. Extensive experimental results demonstrate that, despite reduced model size and training overhead, the proposed UWSR method outperforms current state-of-the-art methods across various datasets, both quantitatively and qualitatively.

Paper Structure

This paper contains 28 sections, 17 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: A comparison of initial state setup between different diffusion-based image super-resolution methods, where $\bm{\epsilon}\sim \mathcal{N}(\bm{0},\sigma_{\text{max}}^2\bm{I})$. (b) SR3 saharia2022image initiate the diffusion process from pure Gaussian noise, whereas (c) ResShift yue2024resshift and (d) our UPSR embed the LR input into the initial noise map. Additionally, we apply uncertainty-guided weighting coefficient $w_u(\bm{y}_0)$ to reduce the noise level in flat areas, achieving a more specialized diffusion process for SR to improve performance.
  • Figure 2: (a) The distribution of pixel residual$|y - x|$ computed on ImageNet-Test dataset yue2024resshift, omitting values where $|y - x| > 0.4$ for clarity. The result exhibits a distinct long-tailed characteristic. (b) The statistical curves of fidelity$|f(y) - x|$ and perceptual quality$|\phi(f(y))-\phi(x)|$ with respect to residual$|y - x|$ under different noise levels. As $|y - x|$ increases, the gap of fidelity remains relatively stable when different noise levels are applied. In contrast, the perceptual quality is more sensitive to the noise level. A larger noise is more requisite in regions with high residual value to achieve better perceptual quality. Meanwhile, we propose weighted noise level $w_u(\bm{y})\sigma_{max}$ which could lead to better results, with details presented in Sec. \ref{['sec: uncertainty-based noise']}.
  • Figure 3: A visualization of the actual residual $|\bm{x}^i - \bm{y}^i|$ and the estimated residual $|g(\bm{y}^i) - \bm{y}^i|$. The real residual exhibits high values in edges and texture regions, indicating the high uncertainty. The residual estimated by SR network is close to the real one and therefore can serve as a rough estimation of uncertainty.
  • Figure 4: The overall pipeline of the proposed UPSR model. An auxiliary SR network is first employed to estimate the uncertainty of the input $\bm{y}_0$. Then the weighting coefficient $w_u$ computed based on the uncertainty $\bm{\psi}_{est}(\bm{y}_0)$ are applied to adjust the noise level in different regions. Meanwhile, both the SR estimate $g(\bm{y}_0)$ and LR input $\bm{y}_0$ are concatenated as the conditional information for the denoiser $f_\theta(\cdot)$.
  • Figure 5: Visual examples of the proposed UNW strategy. Based on the uncertainty estimate (illustrated as the heatmap), the noise level in most flat areas is reduced to preserve more details for better SR results. Meanwhile, noise in edge areas (e.g., in image (a)) and severely degraded parts (e.g., in image (b)) are maintained relatively heavy to ensure reliable score estimation to produce visually pleasing results.
  • ...and 4 more figures