Table of Contents
Fetching ...

Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration

Xu Zhang, Jiaqi Ma, Guoli Wang, Qian Zhang, Huan Zhang, Lefei Zhang

TL;DR

Perceive-IR tackles the challenge of fine-grained degradation perception in All-in-One image restoration by introducing a backbone-agnostic two-stage framework. The first stage builds a quality perceiver via a multi-level prompt learning process in CLIP space, while the second stage integrates this perceiver with a quality-aware restoration loss, augmented by a Semantic Guidance Module and Compact Feature Extraction. The approach achieves state-of-the-art results across All-in-One and generalization scenarios, and demonstrates robust transferability to different restoration backbones and real-world degradations. Overall, it provides a flexible, effective pathway for integrating quality-aware perception into diverse restoration networks, advancing practical All-in-One restoration.

Abstract

Existing All-in-One image restoration methods often fail to perceive degradation types and severity levels simultaneously, overlooking the importance of fine-grained quality perception. Moreover, these methods often utilize highly customized backbones, which hinder their adaptability and integration into more advanced restoration networks. To address these limitations, we propose Perceive-IR, a novel backbone-agnostic All-in-One image restoration framework designed for fine-grained quality control across various degradation types and severity levels. Its modular structure allows core components to function independently of specific backbones, enabling seamless integration into advanced restoration models without significant modifications. Specifically, Perceive-IR operates in two key stages: 1) multi-level quality-driven prompt learning stage, where a fine-grained quality perceiver is meticulously trained to discern three tier quality levels by optimizing the alignment between prompts and images within the CLIP perception space. This stage ensures a nuanced understanding of image quality, laying the groundwork for subsequent restoration; 2) restoration stage, where the quality perceiver is seamlessly integrated with a difficulty-adaptive perceptual loss, forming a quality-aware learning strategy. This strategy not only dynamically differentiates sample learning difficulty but also achieves fine-grained quality control by driving the restored image toward the ground truth while pulling it away from both low- and medium-quality samples.

Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration

TL;DR

Perceive-IR tackles the challenge of fine-grained degradation perception in All-in-One image restoration by introducing a backbone-agnostic two-stage framework. The first stage builds a quality perceiver via a multi-level prompt learning process in CLIP space, while the second stage integrates this perceiver with a quality-aware restoration loss, augmented by a Semantic Guidance Module and Compact Feature Extraction. The approach achieves state-of-the-art results across All-in-One and generalization scenarios, and demonstrates robust transferability to different restoration backbones and real-world degradations. Overall, it provides a flexible, effective pathway for integrating quality-aware perception into diverse restoration networks, advancing practical All-in-One restoration.

Abstract

Existing All-in-One image restoration methods often fail to perceive degradation types and severity levels simultaneously, overlooking the importance of fine-grained quality perception. Moreover, these methods often utilize highly customized backbones, which hinder their adaptability and integration into more advanced restoration networks. To address these limitations, we propose Perceive-IR, a novel backbone-agnostic All-in-One image restoration framework designed for fine-grained quality control across various degradation types and severity levels. Its modular structure allows core components to function independently of specific backbones, enabling seamless integration into advanced restoration models without significant modifications. Specifically, Perceive-IR operates in two key stages: 1) multi-level quality-driven prompt learning stage, where a fine-grained quality perceiver is meticulously trained to discern three tier quality levels by optimizing the alignment between prompts and images within the CLIP perception space. This stage ensures a nuanced understanding of image quality, laying the groundwork for subsequent restoration; 2) restoration stage, where the quality perceiver is seamlessly integrated with a difficulty-adaptive perceptual loss, forming a quality-aware learning strategy. This strategy not only dynamically differentiates sample learning difficulty but also achieves fine-grained quality control by driving the restored image toward the ground truth while pulling it away from both low- and medium-quality samples.
Paper Structure (37 sections, 13 equations, 11 figures, 15 tables)

This paper contains 37 sections, 13 equations, 11 figures, 15 tables.

Figures (11)

  • Figure 1: Mechanisms of existing All-in-One methods vs. our method. (a) Existing All-in-One methods are capable of recognizing degradation types (such as blur, haze, etc.) but struggle to perceive severity levels, often only distinguishing between light and heavy cases. Furthermore, their reliance on customized backbones further limits transferability. (b) Legend for symbols. (c) Our method simultaneously perceives degradation types and severity levels, while being compatible with diverse restoration backbones, offering superior flexibility and versatility.
  • Figure 2: PSNR comparisons with state-of-the-art All-in-One methods across two common All-in-One image restoration scenarios. * denotes results obtained under All-in-One ("Noise+Haze+Rain+Blur+Low-light") training setting, while unmarked results are from All-in-One ("Noise+Haze+Rain") training setting. Our method's results are marked in red, while the best results are indicated in bold.
  • Figure 3: In the proposed multi-level quality-driven prompt learning stage, we initialize and train textual prompts using image-text pairs categorized into three tiers of quality. The medium quality images are obtained by training the restoration model (e.g., Restormer Restormer) using a cross-validation strategy. Then, these image-text pairs are trained with cross-entropy loss in the CLIP model. Once trained, the learned prompts are fixed and used to guide the restoration of high-quality images during the subsequent restoration stage.
  • Figure 4: The proposed restoration stage consists of: (a) Restoration Branch (RB): A 4-level U-shaped encoder-decoder structure that incorporates Transformer Block (TB) Restormer in the encoder and Enhanced Transformer Block (ETB) in the decoder. (b) Compact Feature Extraction (CFE): A module designed to generate distinctive degradation representation. (c) Semantic Guidance Module (SGM): Comprising a pre-trained DINO-v2 DINOv2 and the Prompt Guidance Module (PGM) to produce feature representations enriched with semantic and degradation priors.
  • Figure 5: The proposed quality-aware learning strategy contains two components: (a) The CLIP-aware loss penalizes the dissimilarity between the restored image and the "excellent" prompt, guiding the restored image to better resemble the ground truth; (b) The difficulty-adaptive perceptual loss dynamically adjusts its behavior based on the difficulty level of the restoration process by distinguishing the learning difficulty of samples in feature space.
  • ...and 6 more figures