Versatile Recompression-Aware Perceptual Image Super-Resolution
Mingwei He, Tongda Xu, Xingtong Ge, Ming Sun, Chao Zhou, Yan Wang
TL;DR
VRPSR addresses the practical challenge of recompression in perceptual image super-resolution by modeling the rate-preserving compression process with a diffusion-based codec simulator that is framed as conditional text-to-image generation. The SR model and the simulator are trained with perceptual targets, use slightly compressed supervision, and are optimized via a two-stage process without straight-through estimation, enabling generalization across codecs and rates. Empirically, VRPSR yields over 10% bitrate savings while improving perceptual metrics (LPIPS, DISTS, FID) across Real-ESRGAN and S3Diff under H.264/H.265/H.266 on Kodak and ImageNet, and supports optional joint post-processing after recompression. The framework leverages codec-conditioned embeddings and text prompts to generalize to unseen codec configurations, signaling practical impact for efficient, high-quality image delivery.
Abstract
Perceptual image super-resolution (SR) methods restore degraded images and produce sharp outputs. In practice, those outputs are usually recompressed for storage and transmission. Ignoring recompression is suboptimal as the downstream codec might add additional artifacts to restored images. However, jointly optimizing SR and recompression is challenging, as the codecs are not differentiable and vary in configuration. In this paper, we present Versatile Recompression-Aware Perceptual Super-Resolution (VRPSR), which makes existing perceptual SR aware of versatile compression. First, we formulate compression as conditional text-to-image generation and utilize a pre-trained diffusion model to build a generalizable codec simulator. Next, we propose a set of training techniques tailored for perceptual SR, including optimizing the simulator using perceptual targets and adopting slightly compressed images as the training target. Empirically, our VRPSR saves more than 10\% bitrate based on Real-ESRGAN and S3Diff under H.264/H.265/H.266 compression. Besides, our VRPSR facilitates joint optimization of the SR and post-processing model after recompression.
