Table of Contents
Fetching ...

GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression

Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu

TL;DR

GeoIB tackles the instability of traditional information-bottleneck optimization caused by MI estimation biases by embedding IB into an information-geometric framework. It defines exact projection representations for $I(X;Z)$ and $I(Z;Y)$ onto independence manifolds and introduces two geometry-aware penalties: a distribution-level Fisher–Rao discrepancy and a geometry-level Jacobian–Frobenius term, coupled via a natural-gradient optimization strategy. The method yields improved accuracy–compression trade-offs on MNIST, CIFAR-10, and CelebA, with enhanced robustness under strong compression and reduced leakage, compared to standard IB baselines. This geometry-centric approach provides a principled, reparameterization-invariant mechanism to regulate compression, with practical implications for stable deep representation learning and potential extensions to privacy-preserving and federated settings.

Abstract

Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimators, rather than directly controlling the MI I(X;Z) itself. The looseness and estimator-dependent bias can make IB "compression" only indirectly controlled and optimization fragile. We revisit the IB problem through the lens of information geometry and propose a \textbf{Geo}metric \textbf{I}nformation \textbf{B}ottleneck (\textbf{GeoIB}) that dispenses with mutual information (MI) estimation. We show that I(X;Z) and I(Z;Y) admit exact projection forms as minimal Kullback-Leibler (KL) distances from the joint distributions to their respective independence manifolds. Guided by this view, GeoIB controls information compression with two complementary terms: (i) a distribution-level Fisher-Rao (FR) discrepancy, which matches KL to second order and is reparameterization-invariant; and (ii) a geometry-level Jacobian-Frobenius (JF) term that provides a local capacity-type upper bound on I(Z;X) by penalizing pullback volume expansion of the encoder. We further derive a natural-gradient optimizer consistent with the FR metric and prove that the standard additive natural-gradient step is first-order equivalent to the geodesic update. We conducted extensive experiments and observed that the GeoIB achieves a better trade-off between prediction accuracy and compression ratio in the information plane than the mainstream IB baselines on popular datasets. GeoIB improves invariance and optimization stability by unifying distributional and geometric regularization under a single bottleneck multiplier. The source code of GeoIB is released at "https://anonymous.4open.science/r/G-IB-0569".

GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression

TL;DR

GeoIB tackles the instability of traditional information-bottleneck optimization caused by MI estimation biases by embedding IB into an information-geometric framework. It defines exact projection representations for and onto independence manifolds and introduces two geometry-aware penalties: a distribution-level Fisher–Rao discrepancy and a geometry-level Jacobian–Frobenius term, coupled via a natural-gradient optimization strategy. The method yields improved accuracy–compression trade-offs on MNIST, CIFAR-10, and CelebA, with enhanced robustness under strong compression and reduced leakage, compared to standard IB baselines. This geometry-centric approach provides a principled, reparameterization-invariant mechanism to regulate compression, with practical implications for stable deep representation learning and potential extensions to privacy-preserving and federated settings.

Abstract

Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimators, rather than directly controlling the MI I(X;Z) itself. The looseness and estimator-dependent bias can make IB "compression" only indirectly controlled and optimization fragile. We revisit the IB problem through the lens of information geometry and propose a \textbf{Geo}metric \textbf{I}nformation \textbf{B}ottleneck (\textbf{GeoIB}) that dispenses with mutual information (MI) estimation. We show that I(X;Z) and I(Z;Y) admit exact projection forms as minimal Kullback-Leibler (KL) distances from the joint distributions to their respective independence manifolds. Guided by this view, GeoIB controls information compression with two complementary terms: (i) a distribution-level Fisher-Rao (FR) discrepancy, which matches KL to second order and is reparameterization-invariant; and (ii) a geometry-level Jacobian-Frobenius (JF) term that provides a local capacity-type upper bound on I(Z;X) by penalizing pullback volume expansion of the encoder. We further derive a natural-gradient optimizer consistent with the FR metric and prove that the standard additive natural-gradient step is first-order equivalent to the geodesic update. We conducted extensive experiments and observed that the GeoIB achieves a better trade-off between prediction accuracy and compression ratio in the information plane than the mainstream IB baselines on popular datasets. GeoIB improves invariance and optimization stability by unifying distributional and geometric regularization under a single bottleneck multiplier. The source code of GeoIB is released at "https://anonymous.4open.science/r/G-IB-0569".
Paper Structure (22 sections, 4 theorems, 45 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 4 theorems, 45 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Let $\mathcal{M}=\{p_\phi:\phi\in\Theta\subset\mathbb R^d\}$ be a regular statistical manifold endowed with the Fisher--Rao metric $g_\phi(u,v):=u^\top F_\phi v$, where $F_\phi=\mathbb{E}_{p(x)}\mathbb{E}_{q_\phi(z\mid x)}[\nabla_\phi\log q_\phi\,\nabla_\phi\log q_\phi^\top]$. For a scalar objective i.e., the natural gradient $\widetilde{\nabla}_\phi\mathcal{J}:=F_\phi^{-1}\nabla_\phi\mathcal{J}$

Figures (7)

  • Figure 1: Comparison of VIB and GeoIB. Both models parametrize the encoder as $q_\phi(z\mid x)=\mathcal{N}(\mu(x),\mathrm{diag}(\sigma^2(x)))$ by a network $f_{\phi}$ and use a task decoder $p_{\theta}(y \mid z)$ to increase $I(Z;Y)$ by a network $f_{\theta}$. (a) VIB: compression is enforced by the variational upper bound. (b) GeoIB: replaces explicit MI estimation with two geometry-aware penalties computed deterministically on statistical manifolds: a Fisher–Rao quadratic proxy $\mathcal{L}_{\mathrm{FR}}$ and a Jacobian-Frobenius term $\mathcal{L}_{\mathrm{JF}}$. Solid arrows denote deterministic mappings; dashed arrows indicate reparameterized sampling $z=\mu+\sigma\odot\varepsilon$.
  • Figure 2: Information-geometric view of the Information Bottleneck objective. The ambient statistical manifold $\mathcal{P}$ contains the joint distributions $p_\phi(x,z)$ and $p_\phi(y,z)$. The blue and pink planes represent the e-flat independence manifolds $\mathcal{I}_{XZ}$ and $\mathcal{I}_{YZ}$. The arrows indicate KL I-projections onto these product manifolds, whose minimizers are $p(x)p_\phi(z)$ and $p(y)p_\phi(z)$; the corresponding KL distances equal $I_\phi(Z;X)$ and $I_\phi(Z;Y)$ via the information-geometric Pythagorean identity. Minimizing $\mathcal{L}_{\mathrm{IB}}(\phi)=\beta\,I_\phi(Z;X)-I_\phi(Z;Y)$ therefore pushes $p_\phi(x,z)$ towards $\mathcal{I}_{XZ}$ while pulling $p_\phi(y,z)$ away from $\mathcal{I}_{YZ}$.
  • Figure 3: Evaluation of compression ratio and prediction accuracy from the information plane.
  • Figure 4: Evaluation about the impact of the Bottleneck multiplier $\beta$.
  • Figure 5: Visualizing representation embeddings of posterior means $\mu(x)$ for 10,000 test images in two dimensions on MNIST ($K=128$). Colors denote true labels. From left to right: $\beta=10^{-4}$, $10^{0}$, and $10^{1}$; the corresponding test accuracies are shown below each panel. As $\beta$ increases, within-class dispersion shrinks and clusters move toward class-wise prototypes, indicating stronger compression; accuracy decreases accordingly.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Proposition 1: Natural gradient equals the Riemannian gradient
  • Proposition 2: Steepest descent under the Fisher--Rao metric
  • Theorem 1: Geodesic update via the exponential map
  • Corollary 1: First-order equivalence to the additive update
  • Remark 1: Information-geometric Pythagorean relation