Generative Image Compression by Estimating Gradients of the Rate-variable Feature Distribution
Minghao Han, Weiyi You, Jinhua Zhang, Leheng Zhang, Ce Zhu, Shuhang Gu
TL;DR
This paper tackles the limitation of conventional learned image compression (LIC) in producing overly smooth reconstructions by integrating diffusion modeling into a rate-variable generative compression framework. It reinterprets the forward compression as a rate-dependent diffusion process governed by an entropy model and trains a reverse neural network to reconstruct from compressed latents with a small number of steps, using an SDE-based stochastic sampler and a designed randomness schedule. The key contributions are: (1) a novel forward-backward diffusion formulation for rate-variable compression, (2) a rate-variable entropy model and a reverse U-Net trained with a single loss in latent space, and (3) extensive experiments showing superior perceptual metrics (LPIPS, FID, KID, CLIPIQA) and competitive rate-distortion performance across DIV2K, Kodak, and CLIC2020. This approach enables smooth rate adjustment and high-fidelity, photo-realistic reconstructions with practical decoding latency, advancing diffusion-based generative compression for real-world deployment.
Abstract
While learned image compression (LIC) focuses on efficient data transmission, generative image compression (GIC) extends this framework by integrating generative modeling to produce photo-realistic reconstructed images. In this paper, we propose a novel diffusion-based generative modeling framework tailored for generative image compression. Unlike prior diffusion-based approaches that indirectly exploit diffusion modeling, we reinterpret the compression process itself as a forward diffusion path governed by stochastic differential equations (SDEs). A reverse neural network is trained to reconstruct images by reversing the compression process directly, without requiring Gaussian noise initialization. This approach achieves smooth rate adjustment and photo-realistic reconstructions with only a minimal number of sampling steps. Extensive experiments on benchmark datasets demonstrate that our method outperforms existing generative image compression approaches across a range of metrics, including perceptual distortion, statistical fidelity, and no-reference quality assessments.
