Table of Contents
Fetching ...

One-Step Diffusion-Based Image Compression with Semantic Distillation

Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, Yan Lu

TL;DR

OneDC introduces a one-step diffusion-based image codec that pairs a latent compression module with a one-shot diffusion generator, guided by a hyperprior semantic signal instead of text. It strengthens semantic guidance via semantic distillation from a pretrained generative tokenizer and employs a two-stage training regime that combines pixel-domain supervision with latent-domain diffusion distillation. The method achieves state-of-the-art perceptual quality at ultra-low bitrates, with substantial bitrate reductions (e.g., >39%) and over 20× faster decoding than prior multi-step diffusion codecs, demonstrating practical gains in efficiency and fidelity. The work highlights the potential of one-step diffusion for image compression and provides a scalable pathway to better semantic conditioning and faster decoding, albeit with remaining real-time deployment challenges.

Abstract

While recent diffusion-based generative image codecs have shown impressive performance, their iterative sampling process introduces unpleasing latency. In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. Based on this insight, we propose OneDC, a One-step Diffusion-based generative image Codec -- that integrates a latent compression module with a one-step diffusion generator. Recognizing the critical role of semantic guidance in one-step diffusion, we propose using the hyperprior as a semantic signal, overcoming the limitations of text prompts in representing complex visual content. To further enhance the semantic capability of the hyperprior, we introduce a semantic distillation mechanism that transfers knowledge from a pretrained generative tokenizer to the hyperprior codec. Additionally, we adopt a hybrid pixel- and latent-domain optimization to jointly enhance both reconstruction fidelity and perceptual realism. Extensive experiments demonstrate that OneDC achieves SOTA perceptual quality even with one-step generation, offering over 39% bitrate reduction and 20x faster decoding compared to prior multi-step diffusion-based codecs. Project: https://onedc-codec.github.io/

One-Step Diffusion-Based Image Compression with Semantic Distillation

TL;DR

OneDC introduces a one-step diffusion-based image codec that pairs a latent compression module with a one-shot diffusion generator, guided by a hyperprior semantic signal instead of text. It strengthens semantic guidance via semantic distillation from a pretrained generative tokenizer and employs a two-stage training regime that combines pixel-domain supervision with latent-domain diffusion distillation. The method achieves state-of-the-art perceptual quality at ultra-low bitrates, with substantial bitrate reductions (e.g., >39%) and over 20× faster decoding than prior multi-step diffusion codecs, demonstrating practical gains in efficiency and fidelity. The work highlights the potential of one-step diffusion for image compression and provides a scalable pathway to better semantic conditioning and faster decoding, albeit with remaining real-time deployment challenges.

Abstract

While recent diffusion-based generative image codecs have shown impressive performance, their iterative sampling process introduces unpleasing latency. In this work, we revisit the design of a diffusion-based codec and argue that multi-step sampling is not necessary for generative compression. Based on this insight, we propose OneDC, a One-step Diffusion-based generative image Codec -- that integrates a latent compression module with a one-step diffusion generator. Recognizing the critical role of semantic guidance in one-step diffusion, we propose using the hyperprior as a semantic signal, overcoming the limitations of text prompts in representing complex visual content. To further enhance the semantic capability of the hyperprior, we introduce a semantic distillation mechanism that transfers knowledge from a pretrained generative tokenizer to the hyperprior codec. Additionally, we adopt a hybrid pixel- and latent-domain optimization to jointly enhance both reconstruction fidelity and perceptual realism. Extensive experiments demonstrate that OneDC achieves SOTA perceptual quality even with one-step generation, offering over 39% bitrate reduction and 20x faster decoding compared to prior multi-step diffusion-based codecs. Project: https://onedc-codec.github.io/

Paper Structure

This paper contains 17 sections, 10 equations, 20 figures, 10 tables.

Figures (20)

  • Figure 1: Top: multi-step sampling is not essential for image compression; intermediate results are from DiffEIC li2024towards. Bottom: Visual comparisons including existing open-sourced multi-step diffusion codecs careil2023towardsli2024towardsvonderfecht2025lossy and our proposed one-step codec. Our method achieves the highest visual quality at the lowest bitrate while offering significantly faster decoding.
  • Figure 2: Reconstructions from different semantic guidance. (a) Text prompts (from GPT-4o openai2024gpt4o) struggle to capture complex visual semantics, resulting in severe distortions when using a pretrained text-to-image one-step diffusion model yin2024improved. (b) We finetune the model yin2024improved for hyperprior-to-image generation. Hyperprior guidance yields more faithful reconstructions. (c) Our proposed semantic distillation further improves object-level accuracy, particularly in the highlighted regions.
  • Figure 3: Overview of the OneDC framework. Q denotes scalar quantization, and FSQ stands for finite scalar quantization. AE and AD refer to the arithmetic encoder and decoder, respectively. $h_{ctx}$ and $h_{sem}$ represent the context and semantic decoders used in the hyperprior branch.
  • Figure 4: Two stage training pipeline of OneDC. The codebook in semantic distillation is initialized from the pretrained tokenizer, and the discriminator in diffusion distillation is abbreviated as Disc.
  • Figure 5: Visual examples on the CLIC2020 test set. Zoom in for better view.
  • ...and 15 more figures