An Information-Theoretic Regularizer for Lossy Neural Image Compression
Yingwen Zhang, Meng Wang, Xihua Sheng, Peilin Chen, Junru Li, Li Zhang, Shiqi Wang
TL;DR
This paper addresses the challenge of training lossy neural image compression networks with quantized latents by introducing an information-theoretic regularizer. The key idea is that minimizing latent entropy $H(U)$ is closely tied to maximizing the conditional source entropy $H(X|\hat{X})$, which holds in both direct and transform coding settings with appropriate conditions. The authors propose a practical regularizer that adds a term approximating $-H(X|\hat{X})$ via a learned model $q_{\theta}(X|\hat{X})$, coupled with a GAN-style alternating training strategy, to improve both in-domain efficiency and cross-domain generalization. Extensive experiments across classic and modern compression backbones show consistent BD-Rate improvements and better out-of-domain performance, with regularization most effective when entropy models are aligned. The approach imposes no inference overhead and offers a modular, plug-and-play enhancement for neural image compression pipelines.
Abstract
Lossy image compression networks aim to minimize the latent entropy of images while adhering to specific distortion constraints. However, optimizing the neural network can be challenging due to its nature of learning quantized latent representations. In this paper, our key finding is that minimizing the latent entropy is, to some extent, equivalent to maximizing the conditional source entropy, an insight that is deeply rooted in information-theoretic equalities. Building on this insight, we propose a novel structural regularization method for the neural image compression task by incorporating the negative conditional source entropy into the training objective, such that both the optimization efficacy and the model's generalization ability can be promoted. The proposed information-theoretic regularizer is interpretable, plug-and-play, and imposes no inference overheads. Extensive experiments demonstrate its superiority in regularizing the models and further squeezing bits from the latent representation across various compression structures and unseen domains.
