Table of Contents
Fetching ...

Learned Compression of Encoding Distributions

Mateen Ulhaq, Ivan V. Bajić

TL;DR

This work tackles the amortization gap in learned image compression by introducing input-specific encoding distributions that are compressed and transmitted as side-information. The method uses kernel-density-estimation-based histogram estimation to derive target per-channel distributions and lightweight neural modules to reconstruct adaptive encoding distributions at the decoder, reducing the rate overhead while improving rate-distortion performance. Results on Kodak demonstrate a BD-rate reduction of $-7.10\%$ for the standard fully-factorized model, with substantial reductions in both model size and computation compared to scale hyperprior approaches. The approach provides a practical pathway to enhance entropy models with low overhead, enabling more efficient learned compression without extensive architectural changes.

Abstract

The entropy bottleneck introduced by Ballé et al. is a common component used in many learned compression models. It encodes a transformed latent representation using a static distribution whose parameters are learned during training. However, the actual distribution of the latent data may vary wildly across different inputs. The static distribution attempts to encompass all possible input distributions, thus fitting none of them particularly well. This unfortunate phenomenon, sometimes known as the amortization gap, results in suboptimal compression. To address this issue, we propose a method that dynamically adapts the encoding distribution to match the latent data distribution for a specific input. First, our model estimates a better encoding distribution for a given input. This distribution is then compressed and transmitted as an additional side-information bitstream. Finally, the decoder reconstructs the encoding distribution and uses it to decompress the corresponding latent data. Our method achieves a Bjøntegaard-Delta (BD)-rate gain of -7.10% on the Kodak test dataset when applied to the standard fully-factorized architecture. Furthermore, considering computational complexity, the transform used by our method is an order of magnitude cheaper in terms of Multiply-Accumulate (MAC) operations compared to related side-information methods such as the scale hyperprior.

Learned Compression of Encoding Distributions

TL;DR

This work tackles the amortization gap in learned image compression by introducing input-specific encoding distributions that are compressed and transmitted as side-information. The method uses kernel-density-estimation-based histogram estimation to derive target per-channel distributions and lightweight neural modules to reconstruct adaptive encoding distributions at the decoder, reducing the rate overhead while improving rate-distortion performance. Results on Kodak demonstrate a BD-rate reduction of for the standard fully-factorized model, with substantial reductions in both model size and computation compared to scale hyperprior approaches. The approach provides a practical pathway to enhance entropy models with low overhead, enabling more efficient learned compression without extensive architectural changes.

Abstract

The entropy bottleneck introduced by Ballé et al. is a common component used in many learned compression models. It encodes a transformed latent representation using a static distribution whose parameters are learned during training. However, the actual distribution of the latent data may vary wildly across different inputs. The static distribution attempts to encompass all possible input distributions, thus fitting none of them particularly well. This unfortunate phenomenon, sometimes known as the amortization gap, results in suboptimal compression. To address this issue, we propose a method that dynamically adapts the encoding distribution to match the latent data distribution for a specific input. First, our model estimates a better encoding distribution for a given input. This distribution is then compressed and transmitted as an additional side-information bitstream. Finally, the decoder reconstructs the encoding distribution and uses it to decompress the corresponding latent data. Our method achieves a Bjøntegaard-Delta (BD)-rate gain of -7.10% on the Kodak test dataset when applied to the standard fully-factorized architecture. Furthermore, considering computational complexity, the transform used by our method is an order of magnitude cheaper in terms of Multiply-Accumulate (MAC) operations compared to related side-information methods such as the scale hyperprior.
Paper Structure (14 sections, 18 equations, 5 figures, 3 tables)

This paper contains 14 sections, 18 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Visualization of the suboptimality of using a single static encoding distribution. This distribution is optimal, on average, among all static distributions, but it is suboptimal for any specific data instance.
  • Figure 2: Visualization of target (${\boldsymbol{p}}$) and reconstructed (${\boldsymbol{\hat{p}}}$) encoding distributions. Our proposed method reconstructs ${\boldsymbol{\hat{p}}}$, which is then used by the fully-factorized entropy model to encode the latent derived from a given input image. Each collection of distributions is visualized as a color plot, with channels varying along the $x$-axis, bins varying along the $y$-axis, and negative log-likelihoods represented by the $z$-axis (i.e., color).
  • Figure 3: Adaptive encoding distribution architecture.
  • Figure 4: Architecture layer diagram for $h_{a,q}$ and $h_{s,q}$ transforms. $k$ denotes kernel size, $g$ denotes number of channel groups, and $\downarrow, \uparrow$ denote stride.
  • Figure 5: RD curves for the Kodak dataset.