Table of Contents
Fetching ...

Generalized Gaussian Model for Learned Image Compression

Haotian Zhang, Li Li, Dong Liu

TL;DR

This work introduces a Generalized Gaussian Model (GGM) for learned image compression to better capture latent-variable distributions with only one extra shape parameter $eta$. It presents three variants—model-wise (GGM-m), channel-wise (GGM-c), and element-wise (GGM-e)—and introduces training techniques including a $eta$-dependent lower bound on scale and gradient rectification, plus zero-center quantization and LUT-based entropy coding. Across multiple state-of-the-art compression backbones, GGM variants consistently improve rate-distortion performance over Gaussian and Gaussian mixture models, with GGM-e achieving the strongest gains while keeping practical coding-time overhead modest. The results suggest that flexible tail modeling of latent distributions, together with targeted training strategies and efficient entropy coding, yields meaningful gains in practical learned image compression systems.

Abstract

In learned image compression, probabilistic models play an essential role in characterizing the distribution of latent variables. The Gaussian model with mean and scale parameters has been widely used for its simplicity and effectiveness. Probabilistic models with more parameters, such as the Gaussian mixture models, can fit the distribution of latent variables more precisely, but the corresponding complexity is higher. To balance the compression performance and complexity, we extend the Gaussian model to the generalized Gaussian family for more flexible latent distribution modeling, introducing only one additional shape parameter beta than the Gaussian model. To enhance the performance of the generalized Gaussian model by alleviating the train-test mismatch, we propose improved training methods, including beta-dependent lower bounds for scale parameters and gradient rectification. Our proposed generalized Gaussian model, coupled with the improved training methods, is demonstrated to outperform the Gaussian and Gaussian mixture models on a variety of learned image compression networks.

Generalized Gaussian Model for Learned Image Compression

TL;DR

This work introduces a Generalized Gaussian Model (GGM) for learned image compression to better capture latent-variable distributions with only one extra shape parameter . It presents three variants—model-wise (GGM-m), channel-wise (GGM-c), and element-wise (GGM-e)—and introduces training techniques including a -dependent lower bound on scale and gradient rectification, plus zero-center quantization and LUT-based entropy coding. Across multiple state-of-the-art compression backbones, GGM variants consistently improve rate-distortion performance over Gaussian and Gaussian mixture models, with GGM-e achieving the strongest gains while keeping practical coding-time overhead modest. The results suggest that flexible tail modeling of latent distributions, together with targeted training strategies and efficient entropy coding, yields meaningful gains in practical learned image compression systems.

Abstract

In learned image compression, probabilistic models play an essential role in characterizing the distribution of latent variables. The Gaussian model with mean and scale parameters has been widely used for its simplicity and effectiveness. Probabilistic models with more parameters, such as the Gaussian mixture models, can fit the distribution of latent variables more precisely, but the corresponding complexity is higher. To balance the compression performance and complexity, we extend the Gaussian model to the generalized Gaussian family for more flexible latent distribution modeling, introducing only one additional shape parameter beta than the Gaussian model. To enhance the performance of the generalized Gaussian model by alleviating the train-test mismatch, we propose improved training methods, including beta-dependent lower bounds for scale parameters and gradient rectification. Our proposed generalized Gaussian model, coupled with the improved training methods, is demonstrated to outperform the Gaussian and Gaussian mixture models on a variety of learned image compression networks.

Paper Structure

This paper contains 40 sections, 29 equations, 16 figures, 11 tables.

Figures (16)

  • Figure 1: Shape of the Probability Density Function (PDF), as formulated by Eq. (\ref{['eq:pdf_cdf']}), of the Generalized Gaussian Model (GGM) with various shape parameters $\beta$. The mean and scale parameters are fixed as $\mu=0,\alpha=1$.
  • Figure 2: Bits estimated by GM and GGM of latent variables in mean-scale hyperprior model trained with GM. We train the mean-scale hyperprior model with Gaussian model minnen2018joint and then collect latent variables (with the mean subtracted) with similar estimated scale parameter (Left: 66491 samples with $\alpha\in [0.42,0.425]$; Right: 12899 samples with $\alpha\in [5.6,5.7]$) from the Tecnick dataset. Then, we calculate the average bits of these latent variables under GGM with various $\beta$ and $\alpha$ parameters. The original parameters estimated by Gaussian are marked as $\square$, and the optimal parameters estimated by GGM are marked as $\color{red}\circ$. The visualization shows that even if the analysis transform is constrained by the Gaussian model, the actual distribution can also be better estimated by GGM.
  • Figure 3: Visualization of the typical PDF of the latent variables estimated by the Gaussian Mixture Model (GMM) and the corresponding fitting with GGM and Gaussian model (GM). The results are collected from the hyperprior model with GMM as the probabilistic model. The plot of each Gaussian component in GMM (with label GGM-$i$, $i=1,2,3.$) is weighted. $r^2$ is a statistical measure used to assess the goodness of curve fitting, and the values range from 0 to 1, where 1 indicates a perfect fit and values closer to 1 indicate better fitting. The $r^2$ values fitted by GM and GGM (with its $\beta$ parameter) are shown on the upper right of each subplot. The visualizations show that GGM could fit the distributions of latent variables estimated by GMM well.
  • Figure 4: Visualization of the information for the channel with the highest entropy trained with GM and GGM (element-wise shape parameters) using image kodim19 in the Kodak dataset. The bitrates of hyperprior, i.e., summation of hyper entropy, are GM: 0.028bpp, GGM: 0.027bpp. The visualizations show that compared to GM, GGM reduces the prediction error, requires smaller scale parameters, and removes more structure from the normalized latent with the smaller hyperprior bitrate, which directly translates to a lower bitrate. The backbone model is the mean-scale hyperprior model minnen2018joint. The normalized latent variables are first converted to uniform and then converted to Gaussian for visualization, $\hat{y}_{\text{norm}}=c_{\beta=2}^{-1}(c_{\beta}(\frac{\hat{y}-\mu}{\alpha}))$, where $c_{\beta}$ is the CDF of GGM as formulated in Eq. (\ref{['eq:pdf_cdf']}), and $c^{-1}_{\beta}$ is the inverse function of $c_{\beta}$.
  • Figure 5: Illustration of the $\beta$-dependent lower bound for scale parameter. (a) shows the visualization of $\Delta R=(R(\tilde{Y})-R(\lfloor Y\rceil))/R(\lfloor Y\rceil)$ with various GGM distributions, as formulated by Eq. (\ref{['eq:rate_estimation']}). Values of $\Delta R$ greater than 1 in (a) are clipped. (b) shows the distribution of shape and scale parameters trained with GGM-e on the mean-scale hyperprior model minnen2018joint. The distribution is collected from the Kodak dataset with 7077888 samples. (c) shows the visualization of $\beta$-dependent lower bound for scale parameter and the corresponding rate with $\mu=0$ estimated with rounding.
  • ...and 11 more figures