Table of Contents
Fetching ...

OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates

Jinpei Guo, Yifei Ji, Zheng Chen, Kai Liu, Min Liu, Wang Rao, Wenbo Li, Yong Guo, Yulun Zhang

TL;DR

OSCAR introduces a one-step diffusion codec that unifies multi-rate image compression by treating quantized latents as pseudo-diffusion states, mapping each bit-rate to a pseudo timestep and using a single shared denoiser. The approach relies on representation alignment to make residuals Gaussian, enabling a stable one-pass reconstruction, and employs a two-stage training to couple hyper-encoders with the diffusion backbone. Through a mapping from rate to timestep and joint fine-tuning with perceptual and adversarial objectives, OSCAR outperforms traditional codecs and several diffusion-based baselines while maintaining high efficiency. The method demonstrates strong quantitative and qualitative results on standard benchmarks and shows good generalization to unseen bit-rates, highlighting its practical potential for scalable, diffusive image compression.

Abstract

Pretrained latent diffusion models have shown strong potential for lossy image compression, owing to their powerful generative priors. Most existing diffusion-based methods reconstruct images by iteratively denoising from random noise, guided by compressed latent representations. While these approaches have achieved high reconstruction quality, their multi-step sampling process incurs substantial computational overhead. Moreover, they typically require training separate models for different compression bit-rates, leading to significant training and storage costs. To address these challenges, we propose a one-step diffusion codec across multiple bit-rates. termed OSCAR. Specifically, our method views compressed latents as noisy variants of the original latents, where the level of distortion depends on the bit-rate. This perspective allows them to be modeled as intermediate states along a diffusion trajectory. By establishing a mapping from the compression bit-rate to a pseudo diffusion timestep, we condition a single generative model to support reconstructions at multiple bit-rates. Meanwhile, we argue that the compressed latents retain rich structural information, thereby making one-step denoising feasible. Thus, OSCAR replaces iterative sampling with a single denoising pass, significantly improving inference efficiency. Extensive experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics. The code and models are available at https://github.com/jp-guo/OSCAR.

OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates

TL;DR

OSCAR introduces a one-step diffusion codec that unifies multi-rate image compression by treating quantized latents as pseudo-diffusion states, mapping each bit-rate to a pseudo timestep and using a single shared denoiser. The approach relies on representation alignment to make residuals Gaussian, enabling a stable one-pass reconstruction, and employs a two-stage training to couple hyper-encoders with the diffusion backbone. Through a mapping from rate to timestep and joint fine-tuning with perceptual and adversarial objectives, OSCAR outperforms traditional codecs and several diffusion-based baselines while maintaining high efficiency. The method demonstrates strong quantitative and qualitative results on standard benchmarks and shows good generalization to unseen bit-rates, highlighting its practical potential for scalable, diffusive image compression.

Abstract

Pretrained latent diffusion models have shown strong potential for lossy image compression, owing to their powerful generative priors. Most existing diffusion-based methods reconstruct images by iteratively denoising from random noise, guided by compressed latent representations. While these approaches have achieved high reconstruction quality, their multi-step sampling process incurs substantial computational overhead. Moreover, they typically require training separate models for different compression bit-rates, leading to significant training and storage costs. To address these challenges, we propose a one-step diffusion codec across multiple bit-rates. termed OSCAR. Specifically, our method views compressed latents as noisy variants of the original latents, where the level of distortion depends on the bit-rate. This perspective allows them to be modeled as intermediate states along a diffusion trajectory. By establishing a mapping from the compression bit-rate to a pseudo diffusion timestep, we condition a single generative model to support reconstructions at multiple bit-rates. Meanwhile, we argue that the compressed latents retain rich structural information, thereby making one-step denoising feasible. Thus, OSCAR replaces iterative sampling with a single denoising pass, significantly improving inference efficiency. Extensive experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics. The code and models are available at https://github.com/jp-guo/OSCAR.

Paper Structure

This paper contains 14 sections, 15 equations, 9 figures.

Figures (9)

  • Figure 1: Qualitative comparison. OSCAR is capable of reconstructing complex textures with high realism. OSCAR effectively reconstructs complex textures with high realism.
  • Figure 2: Left: Our method bridges quantization and diffusion by mapping bit-rates to pseudo diffusion timesteps. Right: BD-rates measured by DISTS ding2020image, with circle radius indicating multiply-accumulate operations (MACs). OSCAR achieves a favorable quality-efficiency tradeoff.
  • Figure 3: Overview of OSCAR. Given an input image, the VAE encoder first maps it to the latent space. The latent is then processed by a bit-rate-specific hyper-encoder and quantized via the corresponding codebook. During decompression, the bit-rate is mapped to a pseudo diffusion timestep, and the quantized features are denoised by a U-Net conditioned on this timestep to recover the latent. Finally, the VAE decoder reconstructs the image from the restored latent.
  • Figure 4: Distribution and QQ-plot of residual noise $\boldsymbol{\epsilon}$ in Eq. \ref{['eq:decompose']} before and after representation alignment. Top: red line is the standard Gaussian distribution; blue histogram is the measured distribution. Bottom: red line is the theoretical quantile line; blue dots are the measured values.
  • Figure 5: Evaluation of OSCAR and other compression codecs on Kodak, DIV2K-Val, and CLIC2020.
  • ...and 4 more figures