UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
Xin Li, Bingchen Li, Yeying Jin, Cuiling Lan, Hanxin Zhu, Yulin Ren, Zhibo Chen
TL;DR
This work tackles universal compressed image super-resolution (CSR) across diverse codecs by introducing UCIP, a framework built on a compact MLP-like backbone augmented with dynamic prompts. The Dynamic Prompt Generation Module (DPM) creates $D$ basic prompts and spatial coefficients to yield a content/spatial/task-adaptive prompt $\mathbf{P}$, which guides a Dynamic Prompt-guided Token Mixer Block (PTMB) to perform deformable, axis-wise token mixing with local refinement. The PTMB leverages offsets conditioned on the dynamic prompt and input features, fusing horizontal, vertical, and local information, and further modulates features with a SPADE block for degradation-aware restoration. A novel all-in-one UCSR dataset with six codecs (three traditional, three learning-based) and multiple quality points enables robust evaluation of universal CSR methods. Extensive experiments show UCIP achieves state-of-the-art results across codecs with better efficiency, and prompt-tuning experiments on unseen codecs demonstrate the framework’s adaptability for future CSR tasks.
Abstract
Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focuses on a single compression codec, i.e., JPEG, ignoring the diverse traditional or learning-based codecs in the practical application, e.g., HEVC, VVC, HIFIC, etc. In this work, we propose the first universal CSR framework, dubbed UCIP, with dynamic prompt learning, intending to jointly support the CSR distortions of any compression codecs/modes. Particularly, an efficient dynamic prompt strategy is proposed to mine the content/spatial-aware task-adaptive contextual information for the universal CSR task, using only a small amount of prompts with spatial size 1x1. To simplify contextual information mining, we introduce the novel MLP-like framework backbone for our UCIP by adapting the Active Token Mixer (ATM) to CSR tasks for the first time, where the global information modeling is only taken in horizontal and vertical directions with offset prediction. We also build an all-in-one benchmark dataset for the CSR task by collecting the datasets with the popular 6 diverse traditional and learning-based codecs, including JPEG, HEVC, VVC, HIFIC, etc., resulting in 23 common degradations. Extensive experiments have shown the consistent and excellent performance of our UCIP on universal CSR tasks. The project can be found in https://lixinustc.github.io/UCIP.github.io
