Table of Contents
Fetching ...

An Information-Theoretic Regularizer for Lossy Neural Image Compression

Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, Shiqi Wang

TL;DR

This paper addresses the challenge of training lossy neural image compression networks with quantized latents by introducing an information-theoretic regularizer. The key idea is that minimizing latent entropy $H(U)$ is closely tied to maximizing the conditional source entropy $H(X|\hat{X})$, which holds in both direct and transform coding settings with appropriate conditions. The authors propose a practical regularizer that adds a term approximating $-H(X|\hat{X})$ via a learned model $q_{\theta}(X|\hat{X})$, coupled with a GAN-style alternating training strategy, to improve both in-domain efficiency and cross-domain generalization. Extensive experiments across classic and modern compression backbones show consistent BD-Rate improvements and better out-of-domain performance, with regularization most effective when entropy models are aligned. The approach imposes no inference overhead and offers a modular, plug-and-play enhancement for neural image compression pipelines.

Abstract

Lossy image compression networks aim to minimize the latent entropy of images while adhering to specific distortion constraints. However, optimizing the neural network can be challenging due to its nature of learning quantized latent representations. In this paper, our key finding is that minimizing the latent entropy is, to some extent, equivalent to maximizing the conditional source entropy, an insight that is deeply rooted in information-theoretic equalities. Building on this insight, we propose a novel structural regularization method for the neural image compression task by incorporating the negative conditional source entropy into the training objective, such that both the optimization efficacy and the model's generalization ability can be promoted. The proposed information-theoretic regularizer is interpretable, plug-and-play, and imposes no inference overheads. Extensive experiments demonstrate its superiority in regularizing the models and further squeezing bits from the latent representation across various compression structures and unseen domains.

An Information-Theoretic Regularizer for Lossy Neural Image Compression

TL;DR

This paper addresses the challenge of training lossy neural image compression networks with quantized latents by introducing an information-theoretic regularizer. The key idea is that minimizing latent entropy is closely tied to maximizing the conditional source entropy , which holds in both direct and transform coding settings with appropriate conditions. The authors propose a practical regularizer that adds a term approximating via a learned model , coupled with a GAN-style alternating training strategy, to improve both in-domain efficiency and cross-domain generalization. Extensive experiments across classic and modern compression backbones show consistent BD-Rate improvements and better out-of-domain performance, with regularization most effective when entropy models are aligned. The approach imposes no inference overhead and offers a modular, plug-and-play enhancement for neural image compression pipelines.

Abstract

Lossy image compression networks aim to minimize the latent entropy of images while adhering to specific distortion constraints. However, optimizing the neural network can be challenging due to its nature of learning quantized latent representations. In this paper, our key finding is that minimizing the latent entropy is, to some extent, equivalent to maximizing the conditional source entropy, an insight that is deeply rooted in information-theoretic equalities. Building on this insight, we propose a novel structural regularization method for the neural image compression task by incorporating the negative conditional source entropy into the training objective, such that both the optimization efficacy and the model's generalization ability can be promoted. The proposed information-theoretic regularizer is interpretable, plug-and-play, and imposes no inference overheads. Extensive experiments demonstrate its superiority in regularizing the models and further squeezing bits from the latent representation across various compression structures and unseen domains.

Paper Structure

This paper contains 25 sections, 4 theorems, 25 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Considering a deterministic quantization process ${Q}(\cdot)$ and a deterministic dequantization process ${Q}^{-1}(\cdot)$ for the direct coding model illustrated in fig.compression_model(a), the following equalities hold automatically:

Figures (8)

  • Figure 1: (a) Direct coding model; (b) Information diagram for the direct coding, wherein $H(\bm{X})=H(\bm{X}|\bm{\hat{X}})+H(\bm{U})$ and $H(\bm{X})$ is fixed for any known source; (c) Transform coding model (with possible side information $\bm{\hat{Z}}$balle2018variational).
  • Figure 2: An illustration of the proposed regularization method, wherein an additional source entropy model $q_\theta$ is introduced. The $\mathrm{AE}$ and $\mathrm{AD}$ represent arithmetic encoding and arithmetic decoding, respectively. The side information branch (i.e., the $\bm{\hat{Z}}$ branch) is considered as part of the latent entropy model, which necessitates bits transmission.
  • Figure 3: Conditional source entropy modeling for (a) hyperpriorballe2018variational, (b) autoregressiveminnen2018joint and attentioncheng2020learned, (c) EILC he2022elic, and (d) MLIC++ jiang2023mlic. The architecture and module designs are aligned with the latent entropy model. Additional details regarding the specific modules are provided in the Supplementary Material (\ref{['sec:source-model']}).
  • Figure 4: Performance of the proposed regularization method on the hyperpriorballe2018variational, autoregressiveminnen2018joint, attentioncheng2020learned, ELIC he2022elic, and MLIC++ jiang2023mlic. The anchor is trained with the vanilla rate-distortion loss (Eqn.(\ref{['eq:rate-dist-loss']})) under an equal number of training steps. $\alpha$ indicates the regularization factor (Eqn.(\ref{['eq:proposed-loss']})).
  • Figure 5: Compression performance on Kodak. Our regularizer achieves -1.13%, -1.57%, -0.84%, -1.24%, and -1.82% BD-Rates for the hyperprior, autoregressive, attention models, ELIC, and MLIC++, respectively.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof