Table of Contents
Fetching ...

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, Xiaochun Cao

TL;DR

This work tackles the efficiency bottleneck of diffusion-based image super-resolution by proposing S3Diff, a one-step SR model that harnesses a pre-trained diffusion prior (SD-Turbo) while incorporating degradation-awareness through a degradation-guided LoRA. A dedicated degradation estimation pipeline and per-block ID embeddings enable data- and degradation-dependent parameter updates, preserving the model’s generative priors. An online negative prompting training strategy, combined with classifier-free guidance at inference, significantly improves perceptual quality without increasing inference steps. Experiments on synthetic and real-world benchmarks demonstrate that S3Diff achieves superior perceptual quality and competitive fidelity with far greater efficiency than state-of-the-art diffusion-based SR methods. The approach offers an interactive degradation-aware SR pathway suitable for real-time or resource-constrained scenarios.

Abstract

Diffusion-based image super-resolution (SR) methods have achieved remarkable success by leveraging large pre-trained text-to-image diffusion models as priors. However, these methods still face two challenges: the requirement for dozens of sampling steps to achieve satisfactory results, which limits efficiency in real scenarios, and the neglect of degradation models, which are critical auxiliary information in solving the SR problem. In this work, we introduced a novel one-step SR model, which significantly addresses the efficiency issue of diffusion-based SR methods. Unlike existing fine-tuning strategies, we designed a degradation-guided Low-Rank Adaptation (LoRA) module specifically for SR, which corrects the model parameters based on the pre-estimated degradation information from low-resolution images. This module not only facilitates a powerful data-dependent or degradation-dependent SR model but also preserves the generative prior of the pre-trained diffusion model as much as possible. Furthermore, we tailor a novel training pipeline by introducing an online negative sample generation strategy. Combined with the classifier-free guidance strategy during inference, it largely improves the perceptual quality of the super-resolution results. Extensive experiments have demonstrated the superior efficiency and effectiveness of the proposed model compared to recent state-of-the-art methods.

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

TL;DR

This work tackles the efficiency bottleneck of diffusion-based image super-resolution by proposing S3Diff, a one-step SR model that harnesses a pre-trained diffusion prior (SD-Turbo) while incorporating degradation-awareness through a degradation-guided LoRA. A dedicated degradation estimation pipeline and per-block ID embeddings enable data- and degradation-dependent parameter updates, preserving the model’s generative priors. An online negative prompting training strategy, combined with classifier-free guidance at inference, significantly improves perceptual quality without increasing inference steps. Experiments on synthetic and real-world benchmarks demonstrate that S3Diff achieves superior perceptual quality and competitive fidelity with far greater efficiency than state-of-the-art diffusion-based SR methods. The approach offers an interactive degradation-aware SR pathway suitable for real-time or resource-constrained scenarios.

Abstract

Diffusion-based image super-resolution (SR) methods have achieved remarkable success by leveraging large pre-trained text-to-image diffusion models as priors. However, these methods still face two challenges: the requirement for dozens of sampling steps to achieve satisfactory results, which limits efficiency in real scenarios, and the neglect of degradation models, which are critical auxiliary information in solving the SR problem. In this work, we introduced a novel one-step SR model, which significantly addresses the efficiency issue of diffusion-based SR methods. Unlike existing fine-tuning strategies, we designed a degradation-guided Low-Rank Adaptation (LoRA) module specifically for SR, which corrects the model parameters based on the pre-estimated degradation information from low-resolution images. This module not only facilitates a powerful data-dependent or degradation-dependent SR model but also preserves the generative prior of the pre-trained diffusion model as much as possible. Furthermore, we tailor a novel training pipeline by introducing an online negative sample generation strategy. Combined with the classifier-free guidance strategy during inference, it largely improves the perceptual quality of the super-resolution results. Extensive experiments have demonstrated the superior efficiency and effectiveness of the proposed model compared to recent state-of-the-art methods.
Paper Structure (31 sections, 7 equations, 11 figures, 9 tables)

This paper contains 31 sections, 7 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Comparison of performance and complexity among DM-based SR methods on the DIV2K-Val dataset div2k. Metrics like LPIPS, DISTS, NIQE, FID, and inference time, where smaller scores indicate better image quality, are inverted. All metrics are normalized for better visualization. S3Diff attains top-tier performance in both image quality and complexity with just a single forward pass.
  • Figure 2: Qualitative comparisons on one typical real-world example of the proposed method and the most recent state-of-the-arts, including SinSR wang2023sinsr and OSEDiff wu2024one. (Zoom in for details)
  • Figure 3: Overview of S3Diff. We enhance a pre-trained diffusion model for one-step SR by injecting LoRA layers into the VAE encoder and UNet. Additionally, we employ a pre-trained Degradation Estimation Network to assess image degradation that is used to guide the LoRAs with the introduced block ID embeddings. We tailor a new training pipeline that includes an online negative prompting, reusing generated LR images with negative text prompts. The network is trained with a combination of a reconstruction loss and a GAN loss.
  • Figure 4: We demonstrate images generated from various steps using the pre-trained SD-Turbo, both with and without text prompts.
  • Figure 5: Qualitative comparisons of different methods on the synthesis dataset, DIV2K-Val div2k. (Zoom in for details)
  • ...and 6 more figures