Table of Contents
Fetching ...

Progressive Compression with Universally Quantized Diffusion Models

Yibo Yang, Justus C. Will, Stephan Mandt

TL;DR

The paper tackles progressive neural data compression with diffusion models by introducing Universally Quantized Diffusion Models (UQDM) that replace Gaussian forward noise with uniform noise to enable efficient universal quantization. By optimizing the end-to-end Negative ELBO $\mathcal{L}(\mathbf{x})$ and deriving a backward-density formulation with learned-variance, the method yields progressive reconstructions from partial bitstreams while preserving realism. Across Swirl, CIFAR-10, and ImageNet-64×64, UQDM achieves competitive rate-distortion and rate-realism with a single model, outperforming traditional codecs and rivaling recent neural progressive codecs. The approach reduces computational bottlenecks associated with Gaussian-channel REC and brings neural progressive coding closer to practical deployment, with future work aimed at efficiency and further improvements in realism and rate performance.

Abstract

Diffusion probabilistic models have achieved mainstream success in many generative modeling tasks, from image generation to inverse problem solving. A distinct feature of these models is that they correspond to deep hierarchical latent variable models optimizing a variational evidence lower bound (ELBO) on the data likelihood. Drawing on a basic connection between likelihood modeling and compression, we explore the potential of diffusion models for progressive coding, resulting in a sequence of bits that can be incrementally transmitted and decoded with progressively improving reconstruction quality. Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process, whose negative ELBO corresponds to the end-to-end compression cost using universal quantization. We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model, bringing neural codecs a step closer to practical deployment.

Progressive Compression with Universally Quantized Diffusion Models

TL;DR

The paper tackles progressive neural data compression with diffusion models by introducing Universally Quantized Diffusion Models (UQDM) that replace Gaussian forward noise with uniform noise to enable efficient universal quantization. By optimizing the end-to-end Negative ELBO and deriving a backward-density formulation with learned-variance, the method yields progressive reconstructions from partial bitstreams while preserving realism. Across Swirl, CIFAR-10, and ImageNet-64×64, UQDM achieves competitive rate-distortion and rate-realism with a single model, outperforming traditional codecs and rivaling recent neural progressive codecs. The approach reduces computational bottlenecks associated with Gaussian-channel REC and brings neural progressive coding closer to practical deployment, with future work aimed at efficiency and further improvements in realism and rate performance.

Abstract

Diffusion probabilistic models have achieved mainstream success in many generative modeling tasks, from image generation to inverse problem solving. A distinct feature of these models is that they correspond to deep hierarchical latent variable models optimizing a variational evidence lower bound (ELBO) on the data likelihood. Drawing on a basic connection between likelihood modeling and compression, we explore the potential of diffusion models for progressive coding, resulting in a sequence of bits that can be incrementally transmitted and decoded with progressively improving reconstruction quality. Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process, whose negative ELBO corresponds to the end-to-end compression cost using universal quantization. We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model, bringing neural codecs a step closer to practical deployment.

Paper Structure

This paper contains 28 sections, 2 theorems, 47 equations, 9 figures, 2 algorithms.

Key Result

Theorem A.1

For every fixed $\rho := \frac{t}{T} \in [0, 1)$, $q({\mathbf{z}}_t | {\mathbf{z}}_T, {\mathbf{x}}) \xrightarrow{\;d\;} \mathcal{N}(b_{T|t} \, {\mathbf{z}}_T + c_{T|t} \, {\mathbf{x}}, \beta^2_{T|t} \, \mathbf{I})$ as $T \to \infty$.

Figures (9)

  • Figure 1: Example reconstructions from several traditional and neural codecs, chosen at roughly similar bitrates. At high bitrates, our UQDM method preserves details (e.g. shape and color pattern of the spider, or sharpness of the calligraphy) better than other neural codecs. Note that among the methods considered here, only ours and CTC jeon2023context implement progressive coding.
  • Figure 2: Results on swirl data. The VDM curves correspond to the hypothetical performance of REC that remains computationally intractable. Left: Lossless compression rates v.s. the choice of $T$, for UQDM with/without learned reverse-process variance (blue/orange) and VDM (green). For UQDM, learning the reverse-process variance significantly improved the NELBO, and an optimal $T\approx 5$. Middle, Right: Progressive lossy compression performance for VDM and UQDM, measured in fidelity (PSNR) v.s. bit-rate (middle), or realism (sliced Wasserstein distance) v.s. bit-rate (right).
  • Figure 3: Progressive lossy compression performance of UQDM on the CIFAR10 dataset, comparing fidelity (PSNR) and realism (FID) with bit-rate per pixel (bpp), using either ancestral sampling or denoised prediction to obtain progressive reconstructions as indicated. The VDM curve corresponds to hypothetical performance of REC that is computationally intractable. We achieve better fidelity and realism than JPEG and JPEG2000 across all bit-rates and than BPG in the high bit-rate regime.
  • Figure 4: Progressive lossy compression performance of UQDM on the Imagenet64 dataset, comparing fidelity (PSNR) and realism (FID) with bit-rate per pixel (bpp), using either ancestral sampling or the denoised prediction to obtain progressive reconstructions as indicated. The VDM curve corresponds to hypothetical performance of REC that remains computationally intractable. While the reconstruction quality of other codecs like CDC or BPG plateaus at higher bit-rates, our method continues to gradually improve fidelity and realism even at higher bit-rates where it achieves the best results of any baseline. We beat compression performance of JPEG, JPEG2000, and CTC across all bit-rates. Note that only UQDM, CTC, and JPEG2000 implement progressive coding.
  • Figure 5: Example progressive reconstructions from UQDM trained with $T=4$, obtained with denoised prediction (left) or ancestral sampling (right). The latter avoids blurriness but introduces graininess at low bit-rates, likely because the UQDM is unable to completely capture the data distribution and achieve perfect realism (perfect realism is also difficult to achieve also for Gaussian diffusion, as seen in the rate-realism plot of theis2022lossy). Flow-based reconstructions are qualitatively similar to the denoising-based reconstructions and can be found in \ref{['fig:imagenet-more']}.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Theorem A.1
  • proof
  • Corollary A.1.1
  • proof