Table of Contents
Fetching ...

RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution

Jiangang Wang, Qingnan Fan, Jinwei Chen, Hong Gu, Feng Huang, Wenqi Ren

TL;DR

Real-world image super-resolution is inherently ill-posed due to unknown degradations. This work introduces RAP-SR, which strengthens restoration priors in pretrained diffusion models by (i) building the High-Fidelity Aesthetic Image Dataset (HFAID) with a Quality-Driven Aesthetic Image Selection Pipeline (QDAISP) and (ii) implementing Restoration Priors Refinement (RPR) and Restoration-Oriented Prompt Optimization (ROPO) to better activate those priors. HFAID provides high-fidelity, aesthetically aligned images to quality-tune priors, while ROPO associates restoration quality with novel restoration identifiers embedded in prompts; RPR further refines the base model without full fine-tuning. The approach is plug-and-play, improving existing diffusion-based SR methods and delivering superior perceptual quality across synthetic and real-world datasets, as evidenced by state-of-the-art no-reference metrics. Overall, RAP-SR bridges the gap between general-purpose diffusion models and Real-SR demands, enabling more realistic details and textures without sacrificing fidelity.

Abstract

Benefiting from their powerful generative capabilities, pretrained diffusion models have garnered significant attention for real-world image super-resolution (Real-SR). Existing diffusion-based SR approaches typically utilize semantic information from degraded images and restoration prompts to activate prior for producing realistic high-resolution images. However, general-purpose pretrained diffusion models, not designed for restoration tasks, often have suboptimal prior, and manually defined prompts may fail to fully exploit the generated potential. To address these limitations, we introduce RAP-SR, a novel restoration prior enhancement approach in pretrained diffusion models for Real-SR. First, we develop the High-Fidelity Aesthetic Image Dataset (HFAID), curated through a Quality-Driven Aesthetic Image Selection Pipeline (QDAISP). Our dataset not only surpasses existing ones in fidelity but also excels in aesthetic quality. Second, we propose the Restoration Priors Enhancement Framework, which includes Restoration Priors Refinement (RPR) and Restoration-Oriented Prompt Optimization (ROPO) modules. RPR refines the restoration prior using the HFAID, while ROPO optimizes the unique restoration identifier, improving the quality of the resulting images. RAP-SR effectively bridges the gap between general-purpose models and the demands of Real-SR by enhancing restoration prior. Leveraging the plug-and-play nature of RAP-SR, our approach can be seamlessly integrated into existing diffusion-based SR methods, boosting their performance. Extensive experiments demonstrate its broad applicability and state-of-the-art results. Codes and datasets will be available upon acceptance.

RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution

TL;DR

Real-world image super-resolution is inherently ill-posed due to unknown degradations. This work introduces RAP-SR, which strengthens restoration priors in pretrained diffusion models by (i) building the High-Fidelity Aesthetic Image Dataset (HFAID) with a Quality-Driven Aesthetic Image Selection Pipeline (QDAISP) and (ii) implementing Restoration Priors Refinement (RPR) and Restoration-Oriented Prompt Optimization (ROPO) to better activate those priors. HFAID provides high-fidelity, aesthetically aligned images to quality-tune priors, while ROPO associates restoration quality with novel restoration identifiers embedded in prompts; RPR further refines the base model without full fine-tuning. The approach is plug-and-play, improving existing diffusion-based SR methods and delivering superior perceptual quality across synthetic and real-world datasets, as evidenced by state-of-the-art no-reference metrics. Overall, RAP-SR bridges the gap between general-purpose diffusion models and Real-SR demands, enabling more realistic details and textures without sacrificing fidelity.

Abstract

Benefiting from their powerful generative capabilities, pretrained diffusion models have garnered significant attention for real-world image super-resolution (Real-SR). Existing diffusion-based SR approaches typically utilize semantic information from degraded images and restoration prompts to activate prior for producing realistic high-resolution images. However, general-purpose pretrained diffusion models, not designed for restoration tasks, often have suboptimal prior, and manually defined prompts may fail to fully exploit the generated potential. To address these limitations, we introduce RAP-SR, a novel restoration prior enhancement approach in pretrained diffusion models for Real-SR. First, we develop the High-Fidelity Aesthetic Image Dataset (HFAID), curated through a Quality-Driven Aesthetic Image Selection Pipeline (QDAISP). Our dataset not only surpasses existing ones in fidelity but also excels in aesthetic quality. Second, we propose the Restoration Priors Enhancement Framework, which includes Restoration Priors Refinement (RPR) and Restoration-Oriented Prompt Optimization (ROPO) modules. RPR refines the restoration prior using the HFAID, while ROPO optimizes the unique restoration identifier, improving the quality of the resulting images. RAP-SR effectively bridges the gap between general-purpose models and the demands of Real-SR by enhancing restoration prior. Leveraging the plug-and-play nature of RAP-SR, our approach can be seamlessly integrated into existing diffusion-based SR methods, boosting their performance. Extensive experiments demonstrate its broad applicability and state-of-the-art results. Codes and datasets will be available upon acceptance.

Paper Structure

This paper contains 34 sections, 1 equation, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Visual Comparison: RAP-SR enhances the restoration prior of pretrained diffusion models. Our proposed RAP-SR method can be seamlessly integrated into diffusion-based SR methods, generating more realistic details and textures without the need for fine-tuning the original model.
  • Figure 2: Quality-Driven Aesthetic Image Selection Pipeline. This process is divided into four stages. Unlike previous methods that focus solely on image quality, our approach incorporates the multi-modality model to evaluate both image quality and aesthetic performance. Ultimately, we meticulously select 5,000 ultra-high-quality images from the initial pool of one million images to create the High-Fidelity Aesthetic Image Dataset.
  • Figure 3: Restoration Priors Enhancement Framework: This framework includes Restoration Priors Refinement (RPR) and Restoration-Oriented Prompt Optimization (ROPO). ROPO optimizes the restoration prompt by constructing both positive and negative samples. For negative samples, a degradation model generates low-quality images and then combines the unique restoration identifier with the image's semantic caption to create training data. Through the subsequent RPR process, the model enhances its restoration prior, learning to associate image quality with the restoration identifier.
  • Figure 4: Comparison of No-Reference Metrics Across Different Datasets. Our proposed dataset significantly outperforms existing datasets across all evaluation metrics.
  • Figure 5: Qualitative comparisons on real-world test datasets. RAP-SR obtains the best visual performance.
  • ...and 7 more figures