Table of Contents
Fetching ...

UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt

Xin Li, Bingchen Li, Yeying Jin, Cuiling Lan, Hanxin Zhu, Yulin Ren, Zhibo Chen

TL;DR

This work tackles universal compressed image super-resolution (CSR) across diverse codecs by introducing UCIP, a framework built on a compact MLP-like backbone augmented with dynamic prompts. The Dynamic Prompt Generation Module (DPM) creates $D$ basic prompts and spatial coefficients to yield a content/spatial/task-adaptive prompt $\mathbf{P}$, which guides a Dynamic Prompt-guided Token Mixer Block (PTMB) to perform deformable, axis-wise token mixing with local refinement. The PTMB leverages offsets conditioned on the dynamic prompt and input features, fusing horizontal, vertical, and local information, and further modulates features with a SPADE block for degradation-aware restoration. A novel all-in-one UCSR dataset with six codecs (three traditional, three learning-based) and multiple quality points enables robust evaluation of universal CSR methods. Extensive experiments show UCIP achieves state-of-the-art results across codecs with better efficiency, and prompt-tuning experiments on unseen codecs demonstrate the framework’s adaptability for future CSR tasks.

Abstract

Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focuses on a single compression codec, i.e., JPEG, ignoring the diverse traditional or learning-based codecs in the practical application, e.g., HEVC, VVC, HIFIC, etc. In this work, we propose the first universal CSR framework, dubbed UCIP, with dynamic prompt learning, intending to jointly support the CSR distortions of any compression codecs/modes. Particularly, an efficient dynamic prompt strategy is proposed to mine the content/spatial-aware task-adaptive contextual information for the universal CSR task, using only a small amount of prompts with spatial size 1x1. To simplify contextual information mining, we introduce the novel MLP-like framework backbone for our UCIP by adapting the Active Token Mixer (ATM) to CSR tasks for the first time, where the global information modeling is only taken in horizontal and vertical directions with offset prediction. We also build an all-in-one benchmark dataset for the CSR task by collecting the datasets with the popular 6 diverse traditional and learning-based codecs, including JPEG, HEVC, VVC, HIFIC, etc., resulting in 23 common degradations. Extensive experiments have shown the consistent and excellent performance of our UCIP on universal CSR tasks. The project can be found in https://lixinustc.github.io/UCIP.github.io

UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt

TL;DR

This work tackles universal compressed image super-resolution (CSR) across diverse codecs by introducing UCIP, a framework built on a compact MLP-like backbone augmented with dynamic prompts. The Dynamic Prompt Generation Module (DPM) creates basic prompts and spatial coefficients to yield a content/spatial/task-adaptive prompt , which guides a Dynamic Prompt-guided Token Mixer Block (PTMB) to perform deformable, axis-wise token mixing with local refinement. The PTMB leverages offsets conditioned on the dynamic prompt and input features, fusing horizontal, vertical, and local information, and further modulates features with a SPADE block for degradation-aware restoration. A novel all-in-one UCSR dataset with six codecs (three traditional, three learning-based) and multiple quality points enables robust evaluation of universal CSR methods. Extensive experiments show UCIP achieves state-of-the-art results across codecs with better efficiency, and prompt-tuning experiments on unseen codecs demonstrate the framework’s adaptability for future CSR tasks.

Abstract

Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focuses on a single compression codec, i.e., JPEG, ignoring the diverse traditional or learning-based codecs in the practical application, e.g., HEVC, VVC, HIFIC, etc. In this work, we propose the first universal CSR framework, dubbed UCIP, with dynamic prompt learning, intending to jointly support the CSR distortions of any compression codecs/modes. Particularly, an efficient dynamic prompt strategy is proposed to mine the content/spatial-aware task-adaptive contextual information for the universal CSR task, using only a small amount of prompts with spatial size 1x1. To simplify contextual information mining, we introduce the novel MLP-like framework backbone for our UCIP by adapting the Active Token Mixer (ATM) to CSR tasks for the first time, where the global information modeling is only taken in horizontal and vertical directions with offset prediction. We also build an all-in-one benchmark dataset for the CSR task by collecting the datasets with the popular 6 diverse traditional and learning-based codecs, including JPEG, HEVC, VVC, HIFIC, etc., resulting in 23 common degradations. Extensive experiments have shown the consistent and excellent performance of our UCIP on universal CSR tasks. The project can be found in https://lixinustc.github.io/UCIP.github.io
Paper Structure (25 sections, 7 equations, 10 figures, 5 tables)

This paper contains 25 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Illustration of our proposed UCIP. From top to bottom: (a) The overall framework of UCIP. The LR is first enhanced through several consecutive PTMBs, then upsampled by HR reconstruction module. (b) The architecture of PTMB. Each PTMB utilizes the dynamic prompt generated from a DPM and several cascading PTMMs to iteratively refine distorted inputs. (c) The architecture of PTMM. PTMM takes prompt P along with image feature $\textbf{F}_{{\text{X}}_i}$ as input to adaptively generate offsets, which facilitate the network to perform content/spatial-aware task-adaptive contextual information extraction.
  • Figure 2: The architecture of DPM. To dynamically aggregate content/spatial-aware task-adaptive contextual information, we introduce few number of basic dynamic kernels into the generation process of our prompt. Moreover, our design maintains adaptability to arbitrary input resolutions.
  • Figure 3: Visual comparisons between UCIP and other state-of-the-art methods. To demonstrate the effectiveness of UCIP across different codecs, we display four rows of images, each compressed with JPEG($\mathcal{Q}=10$), HM($\mathcal{Q}=32$), $\text{C}_{\text{PSNR}}$($\mathcal{Q}=4$) and HIFIC($\mathcal{Q}=\text{'med'}$), respectively. We show more results in the Sec. \ref{['more_sub']}.
  • Figure 4: Visual comparisons between UCIP and other state-of-the-art methods under different compression qualities within HIFIC mentzer2020hifi codec. The qualities of HIFIC from top to bottom are 'low', 'medium', and 'high', respectively. We show more results in the Sec. \ref{['more_sub']}.
  • Figure 5: Histograms of learned offsets for the center token from different PTMMs. LR images are randomly sampled from Urban100 urban100 compressed by $\text{C}_\text{PSNR}$cheng2020learned and VTM VTM codecs.
  • ...and 5 more figures