Shannon invariants: A scalable approach to information decomposition
Aaron J. Gutknecht, Fernando E. Rosas, David A. Ehrlich, Abdullah Makkeh, Pedro A. M. Mediano, Michael Wibral
TL;DR
The paper tackles the challenge of analyzing high-order information processing in large distributed systems where traditional partial information decomposition (PID) struggles with scalability and interpretation. It introduces Shannon invariants—entropy-based aggregates such as the average degree of redundancy $\bar{r}$ and the average degree of vulnerability $\bar{v}$—that can be computed efficiently and relate to existing metrics like RSI and a new dual metric, DRSI. By linking source-level (redundancy vs. synergy) and robustness-vulnerability perspectives, the authors provide a principled, scalable framework (including the redundancy lattice and information-atom accounting) for interpreting multivariate information. They demonstrate practical utility by applying the framework to deep learning models (a feedforward MNIST classifier and a face autoencoder), revealing layer-wise information signatures and training dynamics that inform interpretability and robustness considerations.
Abstract
Distributed systems, such as biological and artificial neural networks, process information via complex interactions engaging multiple subsystems, resulting in high-order patterns with distinct properties across scales. Investigating how these systems process information remains challenging due to difficulties in defining appropriate multivariate metrics and ensuring their scalability to large systems. To address these challenges, we introduce a novel framework based on what we call "Shannon invariants" -- quantities that capture essential properties of high-order information processing in a way that depends only on the definition of entropy and can be efficiently calculated for large systems. Our theoretical results demonstrate how Shannon invariants can be used to resolve long-standing ambiguities regarding the interpretation of widely used multivariate information-theoretic measures. Moreover, our practical results reveal distinctive information-processing signatures of various deep learning architectures across layers, which lead to new insights into how these systems process information and how this evolves during training. Overall, our framework resolves fundamental limitations in analyzing high-order phenomena and offers broad opportunities for theoretical developments and empirical analyses.
