Variational Geometric Information Bottleneck: Learning the Shape of Understanding
Ronald Katende
TL;DR
The paper proposes Variational Geometric Information Bottleneck (V-GIB), a framework that couples mutual-information compression with explicit curvature and intrinsic-dimension regularization to produce interpretable, data-efficient representations. Under standard manifold assumptions, it derives non-asymptotic generalization bounds where the intrinsic dimension governs sample complexity and curvature governs stability, and it provides a practical estimator combining variational MI surrogates with Hutchinson-based curvature proxies. Through synthetic manifold recovery, few-shot benchmarks, and real data like Fashion-MNIST and CIFAR-10, V-GIB reveals an information–geometry Pareto frontier and demonstrates robust estimator stability and substantial interpretive efficiency gains, even under data scarcity. The work also introduces human-alignment diagnostics and reproducible protocols, showing that geometry-aware representations not only perform well but also align with human-understandable structure, enabling more reliable and scalable learning systems.
Abstract
We propose a unified information-geometric framework that formalizes understanding in learning as a trade-off between informativeness and geometric simplicity. An encoder phi is evaluated by U(phi) = I(phi(X); Y) - beta * C(phi), where C(phi) penalizes curvature and intrinsic dimensionality, enforcing smooth, low-complexity manifolds. Under mild manifold and regularity assumptions, we derive non-asymptotic bounds showing that generalization error scales with intrinsic dimension while curvature controls approximation stability, directly linking geometry to sample efficiency. To operationalize this theory, we introduce the Variational Geometric Information Bottleneck (V-GIB), a variational estimator that unifies mutual-information compression and curvature regularization through tractable geometric proxies such as the Hutchinson trace, Jacobian norms, and local PCA. Experiments across synthetic manifolds, few-shot settings, and real-world datasets (Fashion-MNIST, CIFAR-10) reveal a robust information-geometry Pareto frontier, stable estimators, and substantial gains in interpretive efficiency. Fractional-data experiments on CIFAR-10 confirm that curvature-aware encoders maintain predictive power under data scarcity, validating the predicted efficiency-curvature law. Overall, V-GIB provides a principled and measurable route to representations that are geometrically coherent, data-efficient, and aligned with human-understandable structure.
