Table of Contents
Fetching ...

DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

Aiwen Jiang, Zhi Wei, Long Peng, Feiqiang Liu, Wenbo Li, Mingwen Wang

TL;DR

This paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration, and achieves a new state-of-the-art perceptual quality level.

Abstract

Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given the severe degradation commonly presented in low-resolution images, coupled with the randomness characteristics of diffusion models, current models struggle to adequately discern semantic and degradation information within severely degraded images. This often leads to obstacles such as semantic loss, visual artifacts, and visual hallucinations, which pose substantial challenges for practical use. To address these challenges, this paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. Complementary priors including semantic content descriptions and degradation prompts are explored. Specifically, on one hand, image-restoration prompt alignment decoder is proposed to automatically discern the degradation degree of LR images, thereby generating beneficial degradation priors for image restoration. On the other hand, much richly tailored descriptions from pretrained multimodal large language model elicit high-level semantic priors closely aligned with human perception, ensuring fidelity control for image restoration. Comprehensive comparisons with state-of-the-art methods have been done on several popular synthetic and real-world benchmark datasets. The quantitative and qualitative analysis have demonstrated that the proposed method achieves a new state-of-the-art perceptual quality level. Related source codes and pre-trained parameters were public in https://github.com/puppy210/DaLPSR.

DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

TL;DR

This paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration, and achieves a new state-of-the-art perceptual quality level.

Abstract

Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given the severe degradation commonly presented in low-resolution images, coupled with the randomness characteristics of diffusion models, current models struggle to adequately discern semantic and degradation information within severely degraded images. This often leads to obstacles such as semantic loss, visual artifacts, and visual hallucinations, which pose substantial challenges for practical use. To address these challenges, this paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. Complementary priors including semantic content descriptions and degradation prompts are explored. Specifically, on one hand, image-restoration prompt alignment decoder is proposed to automatically discern the degradation degree of LR images, thereby generating beneficial degradation priors for image restoration. On the other hand, much richly tailored descriptions from pretrained multimodal large language model elicit high-level semantic priors closely aligned with human perception, ensuring fidelity control for image restoration. Comprehensive comparisons with state-of-the-art methods have been done on several popular synthetic and real-world benchmark datasets. The quantitative and qualitative analysis have demonstrated that the proposed method achieves a new state-of-the-art perceptual quality level. Related source codes and pre-trained parameters were public in https://github.com/puppy210/DaLPSR.

Paper Structure

This paper contains 30 sections, 3 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of the proposed method. (a) Image-Restoration Prompt Alignment Decoder (IRPAD) is designed to extract image-restoration prompts from LR images. (b) The extracted prompts from IRPAD are integrated into image generation process, enabling the model to leverage semantic priors and restoration priors to enhance the quality and fidelity of the reconstructed HR images.
  • Figure 2: The fundamental structure of ControlNet. $Z_{\text{LR}}$ is the latent representation of LR image.
  • Figure 3: The image-restoration prompt generation pipeline. Discretization strategy is employed on degradation level representation to delineate high-order degradation process.
  • Figure 4: The example demonstrates LLaVA has the capacity to produce high-level semantic prompts that align with human perceptual comprehension through application of tailored prompt instructions. RAM is employed to detect semantic tags to formulate tailored instructions for LR images.
  • Figure 5: Visual comparisons between the proposed model and other state-of-the-art methods on DIV2K-Val dataset. For a clearer and more detailed view, please zoom in on the images.
  • ...and 4 more figures