Table of Contents
Fetching ...

PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution

Yong Liu, Hang Dong, Jinshan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, Fei Wang

TL;DR

PatchScaler introduces a Patch-adaptive Group Sampling (PGS) strategy that groups feature patches by quantifying their reconstruction difficulty and establishes shortcut paths with different sampling configurations for each group to further optimize the patch-level reconstruction process of PGS.

Abstract

While diffusion models significantly improve the perceptual quality of super-resolved images, they usually require a large number of sampling steps, resulting in high computational costs and long inference times. Recent efforts have explored reasonable acceleration schemes by reducing the number of sampling steps. However, these approaches treat all regions of the image equally, overlooking the fact that regions with varying levels of reconstruction difficulty require different sampling steps. To address this limitation, we propose PatchScaler, an efficient patch-independent diffusion pipeline for single image super-resolution. Specifically, PatchScaler introduces a Patch-adaptive Group Sampling (PGS) strategy that groups feature patches by quantifying their reconstruction difficulty and establishes shortcut paths with different sampling configurations for each group. To further optimize the patch-level reconstruction process of PGS, we propose a texture prompt that provides rich texture conditional information to the diffusion model. The texture prompt adaptively retrieves texture priors for the target patch from a common reference texture memory. Extensive experiments show that our PatchScaler achieves superior performance in both quantitative and qualitative evaluations, while significantly speeding up inference. Our code will be available at \url{https://github.com/yongliuy/PatchScaler}.

PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution

TL;DR

PatchScaler introduces a Patch-adaptive Group Sampling (PGS) strategy that groups feature patches by quantifying their reconstruction difficulty and establishes shortcut paths with different sampling configurations for each group to further optimize the patch-level reconstruction process of PGS.

Abstract

While diffusion models significantly improve the perceptual quality of super-resolved images, they usually require a large number of sampling steps, resulting in high computational costs and long inference times. Recent efforts have explored reasonable acceleration schemes by reducing the number of sampling steps. However, these approaches treat all regions of the image equally, overlooking the fact that regions with varying levels of reconstruction difficulty require different sampling steps. To address this limitation, we propose PatchScaler, an efficient patch-independent diffusion pipeline for single image super-resolution. Specifically, PatchScaler introduces a Patch-adaptive Group Sampling (PGS) strategy that groups feature patches by quantifying their reconstruction difficulty and establishes shortcut paths with different sampling configurations for each group. To further optimize the patch-level reconstruction process of PGS, we propose a texture prompt that provides rich texture conditional information to the diffusion model. The texture prompt adaptively retrieves texture priors for the target patch from a common reference texture memory. Extensive experiments show that our PatchScaler achieves superior performance in both quantitative and qualitative evaluations, while significantly speeding up inference. Our code will be available at \url{https://github.com/yongliuy/PatchScaler}.
Paper Structure (13 sections, 7 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 7 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Qualitative analysis of unified sampling and adaptive sampling. (b) Quantitative comparison of diffusion-based SR methods on RealSR cai2019toward dataset. Noted that the runtime is measured on the $\times$4 (512 $\to$2048) SR task using an NVIDIA Tesla A100 GPU.
  • Figure 2: Overview of the proposed PatchScaler. PGS dynamically assign feature patches into groups with different sampling configuration based on quantified confidence map. Moreover, the texture prompt provides high-quality conditional information for Patch-DiT by retrieving high-quality texture priors from universal RTM.
  • Figure 3: Examples of coarse HR images and corresponding $Qmap$. Our approach can accurately quantify the reconstruction difficulty of different regions across diverse scenes.
  • Figure 4: Illustration of the proposed PGS, which establishes a shortcut path between $\boldsymbol{x}_{0}$ and $\boldsymbol{y}_{0}$. PGS treats different patches discriminatively and dynamically assign different sampling configurations. Here, $\boldsymbol{y}_{0}$ and $\boldsymbol{x}_{0}$ denote the coarse HR patch and ground truth, respectively. $f_{\theta }$ denotes the diffusion model Patch-DiT.
  • Figure 5: Visual comparisons of state-of-the-art SR methods on real-world low-resolution images.
  • ...and 2 more figures