Table of Contents
Fetching ...

Variational Geometric Information Bottleneck: Learning the Shape of Understanding

Ronald Katende

TL;DR

The paper proposes Variational Geometric Information Bottleneck (V-GIB), a framework that couples mutual-information compression with explicit curvature and intrinsic-dimension regularization to produce interpretable, data-efficient representations. Under standard manifold assumptions, it derives non-asymptotic generalization bounds where the intrinsic dimension governs sample complexity and curvature governs stability, and it provides a practical estimator combining variational MI surrogates with Hutchinson-based curvature proxies. Through synthetic manifold recovery, few-shot benchmarks, and real data like Fashion-MNIST and CIFAR-10, V-GIB reveals an information–geometry Pareto frontier and demonstrates robust estimator stability and substantial interpretive efficiency gains, even under data scarcity. The work also introduces human-alignment diagnostics and reproducible protocols, showing that geometry-aware representations not only perform well but also align with human-understandable structure, enabling more reliable and scalable learning systems.

Abstract

We propose a unified information-geometric framework that formalizes understanding in learning as a trade-off between informativeness and geometric simplicity. An encoder phi is evaluated by U(phi) = I(phi(X); Y) - beta * C(phi), where C(phi) penalizes curvature and intrinsic dimensionality, enforcing smooth, low-complexity manifolds. Under mild manifold and regularity assumptions, we derive non-asymptotic bounds showing that generalization error scales with intrinsic dimension while curvature controls approximation stability, directly linking geometry to sample efficiency. To operationalize this theory, we introduce the Variational Geometric Information Bottleneck (V-GIB), a variational estimator that unifies mutual-information compression and curvature regularization through tractable geometric proxies such as the Hutchinson trace, Jacobian norms, and local PCA. Experiments across synthetic manifolds, few-shot settings, and real-world datasets (Fashion-MNIST, CIFAR-10) reveal a robust information-geometry Pareto frontier, stable estimators, and substantial gains in interpretive efficiency. Fractional-data experiments on CIFAR-10 confirm that curvature-aware encoders maintain predictive power under data scarcity, validating the predicted efficiency-curvature law. Overall, V-GIB provides a principled and measurable route to representations that are geometrically coherent, data-efficient, and aligned with human-understandable structure.

Variational Geometric Information Bottleneck: Learning the Shape of Understanding

TL;DR

The paper proposes Variational Geometric Information Bottleneck (V-GIB), a framework that couples mutual-information compression with explicit curvature and intrinsic-dimension regularization to produce interpretable, data-efficient representations. Under standard manifold assumptions, it derives non-asymptotic generalization bounds where the intrinsic dimension governs sample complexity and curvature governs stability, and it provides a practical estimator combining variational MI surrogates with Hutchinson-based curvature proxies. Through synthetic manifold recovery, few-shot benchmarks, and real data like Fashion-MNIST and CIFAR-10, V-GIB reveals an information–geometry Pareto frontier and demonstrates robust estimator stability and substantial interpretive efficiency gains, even under data scarcity. The work also introduces human-alignment diagnostics and reproducible protocols, showing that geometry-aware representations not only perform well but also align with human-understandable structure, enabling more reliable and scalable learning systems.

Abstract

We propose a unified information-geometric framework that formalizes understanding in learning as a trade-off between informativeness and geometric simplicity. An encoder phi is evaluated by U(phi) = I(phi(X); Y) - beta * C(phi), where C(phi) penalizes curvature and intrinsic dimensionality, enforcing smooth, low-complexity manifolds. Under mild manifold and regularity assumptions, we derive non-asymptotic bounds showing that generalization error scales with intrinsic dimension while curvature controls approximation stability, directly linking geometry to sample efficiency. To operationalize this theory, we introduce the Variational Geometric Information Bottleneck (V-GIB), a variational estimator that unifies mutual-information compression and curvature regularization through tractable geometric proxies such as the Hutchinson trace, Jacobian norms, and local PCA. Experiments across synthetic manifolds, few-shot settings, and real-world datasets (Fashion-MNIST, CIFAR-10) reveal a robust information-geometry Pareto frontier, stable estimators, and substantial gains in interpretive efficiency. Fractional-data experiments on CIFAR-10 confirm that curvature-aware encoders maintain predictive power under data scarcity, validating the predicted efficiency-curvature law. Overall, V-GIB provides a principled and measurable route to representations that are geometrically coherent, data-efficient, and aligned with human-understandable structure.

Paper Structure

This paper contains 88 sections, 54 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Empirical characterization of V-GIB. (a) The energy landscape exhibits a positive coupling between curvature and information energy ($\rho_{KL,\kappa}=0.67$). (b) Curvature regularization improves effective sample efficiency across noise levels. (c) Structured encoders outperform random baselines, confirming that geometric regularization; not model size; drives performance. (d) The information–curvature Pareto frontier shows the monotonic trade-off between information retention and manifold smoothness.
  • Figure 2: CIFAR-10 learning dynamics. Accuracy increases (blue) as alignment MI decreases (red), with equilibrium near epoch 60. Shaded regions indicate $\pm1\sigma$ over seeds.
  • Figure 3: V-GIB on Fashion-MNIST. (a) Top-1 accuracy per epoch; (b) Hutchinson curvature proxy; (c) information–geometry trade-off (KL vs. accuracy, bubble size = curvature); (d) correlation heatmap. Empirical coupling is summarized in Table \ref{['tab:fashion_corr']}.
  • Figure 4: CIFAR-10 fractional validation. Per-epoch dynamics of accuracy (blue) and alignment mutual information (red) for each data fraction, and aggregated efficiency/correlation trends (bottom row). As fraction increases, accuracy improves while alignment MI declines, indicating progressively tighter, lower-curvature manifolds. Bottom-left: mean and max interpretive efficiency ($E(\phi;N)$) rise monotonically with data availability. Bottom-right: correlation between accuracy and alignment MI remains strongly negative, confirming stable geometric–information coupling across scales.
  • Figure 5: Estimator stability. MI and curvature trajectories over epochs (mean $\pm$ std over 3 seeds).
  • ...and 2 more figures

Theorems & Definitions (15)

  • proof : Sketch
  • proof : Sketch
  • proof : Sketch
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 5 more