Table of Contents
Fetching ...

Generative Image Compression by Estimating Gradients of the Rate-variable Feature Distribution

Minghao Han, Weiyi You, Jinhua Zhang, Leheng Zhang, Ce Zhu, Shuhang Gu

TL;DR

This paper tackles the limitation of conventional learned image compression (LIC) in producing overly smooth reconstructions by integrating diffusion modeling into a rate-variable generative compression framework. It reinterprets the forward compression as a rate-dependent diffusion process governed by an entropy model and trains a reverse neural network to reconstruct from compressed latents with a small number of steps, using an SDE-based stochastic sampler and a designed randomness schedule. The key contributions are: (1) a novel forward-backward diffusion formulation for rate-variable compression, (2) a rate-variable entropy model and a reverse U-Net trained with a single loss in latent space, and (3) extensive experiments showing superior perceptual metrics (LPIPS, FID, KID, CLIPIQA) and competitive rate-distortion performance across DIV2K, Kodak, and CLIC2020. This approach enables smooth rate adjustment and high-fidelity, photo-realistic reconstructions with practical decoding latency, advancing diffusion-based generative compression for real-world deployment.

Abstract

While learned image compression (LIC) focuses on efficient data transmission, generative image compression (GIC) extends this framework by integrating generative modeling to produce photo-realistic reconstructed images. In this paper, we propose a novel diffusion-based generative modeling framework tailored for generative image compression. Unlike prior diffusion-based approaches that indirectly exploit diffusion modeling, we reinterpret the compression process itself as a forward diffusion path governed by stochastic differential equations (SDEs). A reverse neural network is trained to reconstruct images by reversing the compression process directly, without requiring Gaussian noise initialization. This approach achieves smooth rate adjustment and photo-realistic reconstructions with only a minimal number of sampling steps. Extensive experiments on benchmark datasets demonstrate that our method outperforms existing generative image compression approaches across a range of metrics, including perceptual distortion, statistical fidelity, and no-reference quality assessments.

Generative Image Compression by Estimating Gradients of the Rate-variable Feature Distribution

TL;DR

This paper tackles the limitation of conventional learned image compression (LIC) in producing overly smooth reconstructions by integrating diffusion modeling into a rate-variable generative compression framework. It reinterprets the forward compression as a rate-dependent diffusion process governed by an entropy model and trains a reverse neural network to reconstruct from compressed latents with a small number of steps, using an SDE-based stochastic sampler and a designed randomness schedule. The key contributions are: (1) a novel forward-backward diffusion formulation for rate-variable compression, (2) a rate-variable entropy model and a reverse U-Net trained with a single loss in latent space, and (3) extensive experiments showing superior perceptual metrics (LPIPS, FID, KID, CLIPIQA) and competitive rate-distortion performance across DIV2K, Kodak, and CLIC2020. This approach enables smooth rate adjustment and high-fidelity, photo-realistic reconstructions with practical decoding latency, advancing diffusion-based generative compression for real-world deployment.

Abstract

While learned image compression (LIC) focuses on efficient data transmission, generative image compression (GIC) extends this framework by integrating generative modeling to produce photo-realistic reconstructed images. In this paper, we propose a novel diffusion-based generative modeling framework tailored for generative image compression. Unlike prior diffusion-based approaches that indirectly exploit diffusion modeling, we reinterpret the compression process itself as a forward diffusion path governed by stochastic differential equations (SDEs). A reverse neural network is trained to reconstruct images by reversing the compression process directly, without requiring Gaussian noise initialization. This approach achieves smooth rate adjustment and photo-realistic reconstructions with only a minimal number of sampling steps. Extensive experiments on benchmark datasets demonstrate that our method outperforms existing generative image compression approaches across a range of metrics, including perceptual distortion, statistical fidelity, and no-reference quality assessments.

Paper Structure

This paper contains 33 sections, 20 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Left: The forward and reverse process of legacy diffusion for comparison. The legacy diffusion consists of transforming data to a simple noise distribution and a reverse ODE to restore the original data. Right: The overview pipeline of our method. The forward process is defined as the entropy model compressing the data. With the bit rates decrease, the compressed images retain less details (please zoom in for better visualization). We can reverse such ODE at any intermediate time to recover the data under various compression rates. This makes a full use of the benefits of diffusion modeling and an organic integration of LIC and diffusion.
  • Figure 2: Comparisons of methods across various distortion and statistical fidelity metrics for the DIV2K dataset. The continuous lines represent rate-variable methods (one model for all bit rates). Circular markers denote GAN-based methods and triangular markers denote diffusion-based methods (every marker corresponds to a separate model respectively).
  • Figure 3: Evaluation of randomness injection schedules when the scale parameter $q_0$ is set as $0.7$ (i.e., bpp $=0.3024$), test on DIV2K with LPIPS, FID, and CLIPIQA. The dashed red lines correspond to deterministic sampling, equivalent to setting $\beta=0$. The blue, orange, and green curves correspond to drawing a noise from a standard normal distribution $\mathcal{N}$, a uniform distribution $\mathcal{U}$, and a probability distribution $p_{ E_\phi}$estimated by the entropy model $E_\phi$, respectively. Note that the latter two distributions are normalized by dividing the statistical deviations $\sigma$. The dots indicate the best observed results.
  • Figure 4: Visualization of the reconstructed images (top to bottom: 0824, 0812, 0807, 0841, and 0846) from DIV2K dataset. The titles under the sub-figures are represented as "method [bpp]".
  • Figure 5: Visualization of the reconstructed images (0854) from DIV2K dataset. The titles under the sub-figures are represented as "method [bpp]". zoom in for better visualization
  • ...and 4 more figures