OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates
Jinpei Guo, Yifei Ji, Zheng Chen, Kai Liu, Min Liu, Wang Rao, Wenbo Li, Yong Guo, Yulun Zhang
TL;DR
OSCAR introduces a one-step diffusion codec that unifies multi-rate image compression by treating quantized latents as pseudo-diffusion states, mapping each bit-rate to a pseudo timestep and using a single shared denoiser. The approach relies on representation alignment to make residuals Gaussian, enabling a stable one-pass reconstruction, and employs a two-stage training to couple hyper-encoders with the diffusion backbone. Through a mapping from rate to timestep and joint fine-tuning with perceptual and adversarial objectives, OSCAR outperforms traditional codecs and several diffusion-based baselines while maintaining high efficiency. The method demonstrates strong quantitative and qualitative results on standard benchmarks and shows good generalization to unseen bit-rates, highlighting its practical potential for scalable, diffusive image compression.
Abstract
Pretrained latent diffusion models have shown strong potential for lossy image compression, owing to their powerful generative priors. Most existing diffusion-based methods reconstruct images by iteratively denoising from random noise, guided by compressed latent representations. While these approaches have achieved high reconstruction quality, their multi-step sampling process incurs substantial computational overhead. Moreover, they typically require training separate models for different compression bit-rates, leading to significant training and storage costs. To address these challenges, we propose a one-step diffusion codec across multiple bit-rates. termed OSCAR. Specifically, our method views compressed latents as noisy variants of the original latents, where the level of distortion depends on the bit-rate. This perspective allows them to be modeled as intermediate states along a diffusion trajectory. By establishing a mapping from the compression bit-rate to a pseudo diffusion timestep, we condition a single generative model to support reconstructions at multiple bit-rates. Meanwhile, we argue that the compressed latents retain rich structural information, thereby making one-step denoising feasible. Thus, OSCAR replaces iterative sampling with a single denoising pass, significantly improving inference efficiency. Extensive experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics. The code and models are available at https://github.com/jp-guo/OSCAR.
