Table of Contents
Fetching ...

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

Yuanting Fan, Chengxu Liu, Nengzhong Yin, Changlong Gao, Xueming Qian

TL;DR

The AdaDiffSR is proposed, a DMs-based SR pipeline with dynamic timesteps sampling strategy (DTSS) that achieves comparable performance over current state-of-the-art DMs-based SR methods while consuming less computational resources and inference time on both synthetic and real-world datasets.

Abstract

Diffusion models (DMs) have shown promising results on single-image super-resolution and other image-to-image translation tasks. Benefiting from more computational resources and longer inference times, they are able to yield more realistic images. Existing DMs-based super-resolution methods try to achieve an overall average recovery over all regions via iterative refinement, ignoring the consideration that different input image regions require different timesteps to reconstruct. In this work, we notice that previous DMs-based super-resolution methods suffer from wasting computational resources to reconstruct invisible details. To further improve the utilization of computational resources, we propose AdaDiffSR, a DMs-based SR pipeline with dynamic timesteps sampling strategy (DTSS). Specifically, by introducing the multi-metrics latent entropy module (MMLE), we can achieve dynamic perception of the latent spatial information gain during the denoising process, thereby guiding the dynamic selection of the timesteps. In addition, we adopt a progressive feature injection module (PFJ), which dynamically injects the original image features into the denoising process based on the current information gain, so as to generate images with both fidelity and realism. Experiments show that our AdaDiffSR achieves comparable performance over current state-of-the-art DMs-based SR methods while consuming less computational resources and inference time on both synthetic and real-world datasets.

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

TL;DR

The AdaDiffSR is proposed, a DMs-based SR pipeline with dynamic timesteps sampling strategy (DTSS) that achieves comparable performance over current state-of-the-art DMs-based SR methods while consuming less computational resources and inference time on both synthetic and real-world datasets.

Abstract

Diffusion models (DMs) have shown promising results on single-image super-resolution and other image-to-image translation tasks. Benefiting from more computational resources and longer inference times, they are able to yield more realistic images. Existing DMs-based super-resolution methods try to achieve an overall average recovery over all regions via iterative refinement, ignoring the consideration that different input image regions require different timesteps to reconstruct. In this work, we notice that previous DMs-based super-resolution methods suffer from wasting computational resources to reconstruct invisible details. To further improve the utilization of computational resources, we propose AdaDiffSR, a DMs-based SR pipeline with dynamic timesteps sampling strategy (DTSS). Specifically, by introducing the multi-metrics latent entropy module (MMLE), we can achieve dynamic perception of the latent spatial information gain during the denoising process, thereby guiding the dynamic selection of the timesteps. In addition, we adopt a progressive feature injection module (PFJ), which dynamically injects the original image features into the denoising process based on the current information gain, so as to generate images with both fidelity and realism. Experiments show that our AdaDiffSR achieves comparable performance over current state-of-the-art DMs-based SR methods while consuming less computational resources and inference time on both synthetic and real-world datasets.

Paper Structure

This paper contains 27 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Visual comparisons between background and foreground regions during denoising process. The red and blue boxes represent the background and foreground regions, respectively. We visualize the variations of several corresponding metrics as the radar chart on the right, the smaller the better. As the timesteps increase from 50 to 200, we find that the visual results of the foreground regions become more satisfactory while the background remains almost unchanged.
  • Figure 2: The framework of AdaDiffSR. We calculate the information gain during the denoising process, via information gain, we can modulate the original image features to guide the PFJ module and adjust timesteps dynamically for different regions, thus achieving a trade-off between the computational resources and restoration quality.
  • Figure 3: To demonstrate the validity of different IQA metrics during the denoising process. (a), we visualize the feature in perceptual-oriented metric LPIPS and the corresponding denoised image during the restoration process. (b), we plot the variations of several IQA metrics during the denoising process with 200 timesteps using DDPM sampling strategy ho2020denoising. Best viewed when disabled some metrics (e.g., TOPIQ chen2023topiq, CLIP-IQA wang2022exploring and etc.), the variations of these metrics are similar with HyperIQA su2020blindly.
  • Figure 4: Qualitative comparisons on several real-world images. (The first three rows are arbitrary resolutions, and the resolution of the last three rows is fixed at $512 \times 512$)
  • Figure 5: Visual comparisons of different configurations in MMLE metrics.
  • ...and 1 more figures