Generative Preprocessing for Image Compression with Pre-trained Diffusion Models
Mengxi Guo, Shijie Zhao, Junlin Li, Li Zhang
TL;DR
This work reframes image preprocessing for compression as a rate-perception problem by leveraging large pre-trained diffusion models. It distills Stable Diffusion 2.1 into a compact one-step generator via Consistent Score Identity Distillation and then finely tunes only the attention modules using a differentiable BPG surrogate and a rate-perception loss to guide optimization. The approach achieves substantial BD-rate reductions (up to 30.13% in DISTS on Kodak) and superior perceptual quality across standard codecs, while remaining compatible with existing pipelines. This demonstrates the potential of generative priors to enhance perceptual compression preprocessing and informs future rate-perception optimization strategies.
Abstract
Preprocessing is a well-established technique for optimizing compression, yet existing methods are predominantly Rate-Distortion (R-D) optimized and constrained by pixel-level fidelity. This work pioneers a shift towards Rate-Perception (R-P) optimization by, for the first time, adapting a large-scale pre-trained diffusion model for compression preprocessing. We propose a two-stage framework: first, we distill the multi-step Stable Diffusion 2.1 into a compact, one-step image-to-image model using Consistent Score Identity Distillation (CiD). Second, we perform a parameter-efficient fine-tuning of the distilled model's attention modules, guided by a Rate-Perception loss and a differentiable codec surrogate. Our method seamlessly integrates with standard codecs without any modification and leverages the model's powerful generative priors to enhance texture and mitigate artifacts. Experiments show substantial R-P gains, achieving up to a 30.13% BD-rate reduction in DISTS on the Kodak dataset and delivering superior subjective visual quality.
