Generalized Gaussian Model for Learned Image Compression
Haotian Zhang, Li Li, Dong Liu
TL;DR
This work introduces a Generalized Gaussian Model (GGM) for learned image compression to better capture latent-variable distributions with only one extra shape parameter $eta$. It presents three variants—model-wise (GGM-m), channel-wise (GGM-c), and element-wise (GGM-e)—and introduces training techniques including a $eta$-dependent lower bound on scale and gradient rectification, plus zero-center quantization and LUT-based entropy coding. Across multiple state-of-the-art compression backbones, GGM variants consistently improve rate-distortion performance over Gaussian and Gaussian mixture models, with GGM-e achieving the strongest gains while keeping practical coding-time overhead modest. The results suggest that flexible tail modeling of latent distributions, together with targeted training strategies and efficient entropy coding, yields meaningful gains in practical learned image compression systems.
Abstract
In learned image compression, probabilistic models play an essential role in characterizing the distribution of latent variables. The Gaussian model with mean and scale parameters has been widely used for its simplicity and effectiveness. Probabilistic models with more parameters, such as the Gaussian mixture models, can fit the distribution of latent variables more precisely, but the corresponding complexity is higher. To balance the compression performance and complexity, we extend the Gaussian model to the generalized Gaussian family for more flexible latent distribution modeling, introducing only one additional shape parameter beta than the Gaussian model. To enhance the performance of the generalized Gaussian model by alleviating the train-test mismatch, we propose improved training methods, including beta-dependent lower bounds for scale parameters and gradient rectification. Our proposed generalized Gaussian model, coupled with the improved training methods, is demonstrated to outperform the Gaussian and Gaussian mixture models on a variety of learned image compression networks.
