Redundancy as a Structural Information Principle for Learning and Generalization
Yuda Bi, Ying Zhu, Vince D Calhoun
TL;DR
Redundancy is reframed as a structural information principle that unifies various measures of dependence under an $f$-divergence geometry. The master definition $\mathcal{R}_f(X)=D_f(P_X\|\Pi_X)$ reveals that mutual information, covariance proxies, and spectral redundancy are projections of the same redundancy geometry. The paper proves the existence of an interior redundancy equilibrium $R^{*}$ that balances over-compression and over-coupling, and shows that learning systems self-organize toward this balance, with MAE experiments illustrating peak generalization near $R^{*}$. This framework connects information theory and finite-regime learning, offering a tunable quantity to guide the design of robust, generalizable representations and foundation models.
Abstract
We present a theoretical framework that extends classical information theory to finite and structured systems by redefining redundancy as a fundamental property of information organization rather than inefficiency. In this framework, redundancy is expressed as a general family of informational divergences that unifies multiple classical measures, such as mutual information, chi-squared dependence, and spectral redundancy, under a single geometric principle. This reveals that these traditional quantities are not isolated heuristics but projections of a shared redundancy geometry. The theory further predicts that redundancy is bounded both above and below, giving rise to an optimal equilibrium that balances over-compression (loss of structure) and over-coupling (collapse). While classical communication theory favors minimal redundancy for transmission efficiency, finite and structured systems, such as those underlying real-world learning, achieve maximal stability and generalization near this equilibrium. Experiments with masked autoencoders are used to illustrate and verify this principle: the model exhibits a stable redundancy level where generalization peaks. Together, these results establish redundancy as a measurable and tunable quantity that bridges the asymptotic world of communication and the finite world of learning.
