Table of Contents
Fetching ...

Versatile Recompression-Aware Perceptual Image Super-Resolution

Mingwei He, Tongda Xu, Xingtong Ge, Ming Sun, Chao Zhou, Yan Wang

TL;DR

VRPSR addresses the practical challenge of recompression in perceptual image super-resolution by modeling the rate-preserving compression process with a diffusion-based codec simulator that is framed as conditional text-to-image generation. The SR model and the simulator are trained with perceptual targets, use slightly compressed supervision, and are optimized via a two-stage process without straight-through estimation, enabling generalization across codecs and rates. Empirically, VRPSR yields over 10% bitrate savings while improving perceptual metrics (LPIPS, DISTS, FID) across Real-ESRGAN and S3Diff under H.264/H.265/H.266 on Kodak and ImageNet, and supports optional joint post-processing after recompression. The framework leverages codec-conditioned embeddings and text prompts to generalize to unseen codec configurations, signaling practical impact for efficient, high-quality image delivery.

Abstract

Perceptual image super-resolution (SR) methods restore degraded images and produce sharp outputs. In practice, those outputs are usually recompressed for storage and transmission. Ignoring recompression is suboptimal as the downstream codec might add additional artifacts to restored images. However, jointly optimizing SR and recompression is challenging, as the codecs are not differentiable and vary in configuration. In this paper, we present Versatile Recompression-Aware Perceptual Super-Resolution (VRPSR), which makes existing perceptual SR aware of versatile compression. First, we formulate compression as conditional text-to-image generation and utilize a pre-trained diffusion model to build a generalizable codec simulator. Next, we propose a set of training techniques tailored for perceptual SR, including optimizing the simulator using perceptual targets and adopting slightly compressed images as the training target. Empirically, our VRPSR saves more than 10\% bitrate based on Real-ESRGAN and S3Diff under H.264/H.265/H.266 compression. Besides, our VRPSR facilitates joint optimization of the SR and post-processing model after recompression.

Versatile Recompression-Aware Perceptual Image Super-Resolution

TL;DR

VRPSR addresses the practical challenge of recompression in perceptual image super-resolution by modeling the rate-preserving compression process with a diffusion-based codec simulator that is framed as conditional text-to-image generation. The SR model and the simulator are trained with perceptual targets, use slightly compressed supervision, and are optimized via a two-stage process without straight-through estimation, enabling generalization across codecs and rates. Empirically, VRPSR yields over 10% bitrate savings while improving perceptual metrics (LPIPS, DISTS, FID) across Real-ESRGAN and S3Diff under H.264/H.265/H.266 on Kodak and ImageNet, and supports optional joint post-processing after recompression. The framework leverages codec-conditioned embeddings and text prompts to generalize to unseen codec configurations, signaling practical impact for efficient, high-quality image delivery.

Abstract

Perceptual image super-resolution (SR) methods restore degraded images and produce sharp outputs. In practice, those outputs are usually recompressed for storage and transmission. Ignoring recompression is suboptimal as the downstream codec might add additional artifacts to restored images. However, jointly optimizing SR and recompression is challenging, as the codecs are not differentiable and vary in configuration. In this paper, we present Versatile Recompression-Aware Perceptual Super-Resolution (VRPSR), which makes existing perceptual SR aware of versatile compression. First, we formulate compression as conditional text-to-image generation and utilize a pre-trained diffusion model to build a generalizable codec simulator. Next, we propose a set of training techniques tailored for perceptual SR, including optimizing the simulator using perceptual targets and adopting slightly compressed images as the training target. Empirically, our VRPSR saves more than 10\% bitrate based on Real-ESRGAN and S3Diff under H.264/H.265/H.266 compression. Besides, our VRPSR facilitates joint optimization of the SR and post-processing model after recompression.

Paper Structure

This paper contains 46 sections, 11 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Qualitative results of VRPSR on S3Diff and Real-ESRGAN under H.264, H.265, and H.266 compression across different bitrates. It is shown that after compression, the SR methods with VRPSR produce significantly better visual results at the same bitrate.
  • Figure 2: The training and inference pipeline of VRPSR. For training, we adopt a rate target codec simulator and use a slightly compressed image for supervision. For inference, the degraded image $\tilde{X}$ is first restored by a perceptual super-resolution method to obtain $X'$. Then it is compressed by the encoder into $\hat{X}$ with bitrate $\bar{R}$. Then, the compressed image is transmitted from the sender to the receiver. Then we either directly use image $\hat{X}$ as the final output, or optionally include a post-processing module.
  • Figure 3: Rate-distortion performance on Kodak and ImageNet Validation dataset. Our VRPSR boosts the performance of RealESRGAN and S3Diff with multiple downstream codecs such as H.264, H.265, and H.266.
  • Figure 4: Qualitative results of optional post-processing. See definition of (a)-(f) in Table \ref{['tab:sandwich']}.
  • Figure 5: Qualitative results of VRPSR on S3Diff under H.264 recompression. The final output with the post-processing module effectively removes codec artifacts and restores perceptual details.
  • ...and 5 more figures