Table of Contents
Fetching ...

PromptCIR: Blind Compressed Image Restoration with Prompt Learning

Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

TL;DR

PromptCIR addresses blind compressed image restoration by forgoing explicit quality-factor estimation and instead encoding compression information through lightweight, content-aware prompts that interact with image features. It builds a Restormer-based 4-stage U-shaped backbone augmented with a dynamic prompt block and a Residual Hybrid Attention Group to capture both local detail and global context, enabling effective artifact removal across unknown degradation levels. A two-stage training regime on large-scale data (DF2K and LSDIR) with 7 predefined quality factors, followed by online fine-tuning, enables strong generalization to unseen compression levels and datasets. Empirically, PromptCIR achieves state-of-the-art performance on blind and non-blind CIR benchmarks and won NTIRE 2024's blind compressed image enhancement track, highlighting the practical viability of prompt-guided restoration for real-world, unknown-degradation scenarios.

Abstract

Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical quality factor lacks spatial information, preventing network adaptability toward image contents. Recent studies in prompt-learning-based image restoration have showcased the potential of prompts to generalize across varied degradation types and degrees. This motivated us to design a prompt-learning-based compressed image restoration network, dubbed PromptCIR, which can effectively restore images from various compress levels. Specifically, PromptCIR exploits prompts to encode compression information implicitly, where prompts directly interact with soft weights generated from image features, thus providing dynamic content-aware and distortion-aware guidance for the restoration process. The light-weight prompts enable our method to adapt to different compression levels, while introducing minimal parameter overhead. Overall, PromptCIR leverages the powerful transformer-based backbone with the dynamic prompt module to proficiently handle blind CIR tasks, winning first place in the NTIRE 2024 challenge of blind compressed image enhancement track. Extensive experiments have validated the effectiveness of our proposed PromptCIR. The code is available at https://github.com/lbc12345/PromptCIR-NTIRE24.

PromptCIR: Blind Compressed Image Restoration with Prompt Learning

TL;DR

PromptCIR addresses blind compressed image restoration by forgoing explicit quality-factor estimation and instead encoding compression information through lightweight, content-aware prompts that interact with image features. It builds a Restormer-based 4-stage U-shaped backbone augmented with a dynamic prompt block and a Residual Hybrid Attention Group to capture both local detail and global context, enabling effective artifact removal across unknown degradation levels. A two-stage training regime on large-scale data (DF2K and LSDIR) with 7 predefined quality factors, followed by online fine-tuning, enables strong generalization to unseen compression levels and datasets. Empirically, PromptCIR achieves state-of-the-art performance on blind and non-blind CIR benchmarks and won NTIRE 2024's blind compressed image enhancement track, highlighting the practical viability of prompt-guided restoration for real-world, unknown-degradation scenarios.

Abstract

Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical quality factor lacks spatial information, preventing network adaptability toward image contents. Recent studies in prompt-learning-based image restoration have showcased the potential of prompts to generalize across varied degradation types and degrees. This motivated us to design a prompt-learning-based compressed image restoration network, dubbed PromptCIR, which can effectively restore images from various compress levels. Specifically, PromptCIR exploits prompts to encode compression information implicitly, where prompts directly interact with soft weights generated from image features, thus providing dynamic content-aware and distortion-aware guidance for the restoration process. The light-weight prompts enable our method to adapt to different compression levels, while introducing minimal parameter overhead. Overall, PromptCIR leverages the powerful transformer-based backbone with the dynamic prompt module to proficiently handle blind CIR tasks, winning first place in the NTIRE 2024 challenge of blind compressed image enhancement track. Extensive experiments have validated the effectiveness of our proposed PromptCIR. The code is available at https://github.com/lbc12345/PromptCIR-NTIRE24.
Paper Structure (14 sections, 1 equation, 7 figures, 4 tables)

This paper contains 14 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The overall framework of our proposed PromptCIR. We introduce prompt learning for blind compressed image restoration. To preserve content-aware knowledge while efficiently encoding distortion-aware information, we utilize DPM from UCIP li2024ucip to provide implicit guidance for the restoration process. To enhance the representation ability of the network, we replace the transformer blocks in the first two stages with RHAG inherited from HAT hat.
  • Figure 2: The structure of transformer block zamir2022restormerpromptir. It is composed of two modules, including the Multi Dconv head transposed attention module (MDTA) and the Gated Dconv feed-forward network (GDFN). Compared to traditional self-attention block, transposed attention mechanism provides more efficient information extraction with less computational complexity.
  • Figure 3: The structure of PIM. Here a $1\times 1$ convolution is used to reduce channel numbers to the same as input features.
  • Figure 4: (Upper) illustrates the structure of RHAG hat, it combines several hybrid attention blocks (Bottom) with an overlapped cross-attention block (OCAB) followed by one convolution. Such design enhances the representation capabilities of the network in both local and global information extraction, and highly meets our needs for blind CIR.
  • Figure 5: Qualitative comparisons between different methods on blind CIR. Zoom in for better views. (Upper: LIVE1_bikes. Bottom: DIV2K_0846)
  • ...and 2 more figures