Redundancy as a Structural Information Principle for Learning and Generalization

Yuda Bi; Ying Zhu; Vince D Calhoun

Redundancy as a Structural Information Principle for Learning and Generalization

Yuda Bi, Ying Zhu, Vince D Calhoun

TL;DR

Redundancy is reframed as a structural information principle that unifies various measures of dependence under an $f$-divergence geometry. The master definition $\mathcal{R}_f(X)=D_f(P_X\|\Pi_X)$ reveals that mutual information, covariance proxies, and spectral redundancy are projections of the same redundancy geometry. The paper proves the existence of an interior redundancy equilibrium $R^{*}$ that balances over-compression and over-coupling, and shows that learning systems self-organize toward this balance, with MAE experiments illustrating peak generalization near $R^{*}$. This framework connects information theory and finite-regime learning, offering a tunable quantity to guide the design of robust, generalizable representations and foundation models.

Abstract

We present a theoretical framework that extends classical information theory to finite and structured systems by redefining redundancy as a fundamental property of information organization rather than inefficiency. In this framework, redundancy is expressed as a general family of informational divergences that unifies multiple classical measures, such as mutual information, chi-squared dependence, and spectral redundancy, under a single geometric principle. This reveals that these traditional quantities are not isolated heuristics but projections of a shared redundancy geometry. The theory further predicts that redundancy is bounded both above and below, giving rise to an optimal equilibrium that balances over-compression (loss of structure) and over-coupling (collapse). While classical communication theory favors minimal redundancy for transmission efficiency, finite and structured systems, such as those underlying real-world learning, achieve maximal stability and generalization near this equilibrium. Experiments with masked autoencoders are used to illustrate and verify this principle: the model exhibits a stable redundancy level where generalization peaks. Together, these results establish redundancy as a measurable and tunable quantity that bridges the asymptotic world of communication and the finite world of learning.

Redundancy as a Structural Information Principle for Learning and Generalization

TL;DR

Abstract

Redundancy as a Structural Information Principle for Learning and Generalization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (41)