Progressive Compression with Universally Quantized Diffusion Models
Yibo Yang, Justus C. Will, Stephan Mandt
TL;DR
The paper tackles progressive neural data compression with diffusion models by introducing Universally Quantized Diffusion Models (UQDM) that replace Gaussian forward noise with uniform noise to enable efficient universal quantization. By optimizing the end-to-end Negative ELBO $\mathcal{L}(\mathbf{x})$ and deriving a backward-density formulation with learned-variance, the method yields progressive reconstructions from partial bitstreams while preserving realism. Across Swirl, CIFAR-10, and ImageNet-64×64, UQDM achieves competitive rate-distortion and rate-realism with a single model, outperforming traditional codecs and rivaling recent neural progressive codecs. The approach reduces computational bottlenecks associated with Gaussian-channel REC and brings neural progressive coding closer to practical deployment, with future work aimed at efficiency and further improvements in realism and rate performance.
Abstract
Diffusion probabilistic models have achieved mainstream success in many generative modeling tasks, from image generation to inverse problem solving. A distinct feature of these models is that they correspond to deep hierarchical latent variable models optimizing a variational evidence lower bound (ELBO) on the data likelihood. Drawing on a basic connection between likelihood modeling and compression, we explore the potential of diffusion models for progressive coding, resulting in a sequence of bits that can be incrementally transmitted and decoded with progressively improving reconstruction quality. Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process, whose negative ELBO corresponds to the end-to-end compression cost using universal quantization. We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model, bringing neural codecs a step closer to practical deployment.
