Table of Contents
Fetching ...

Shannon invariants: A scalable approach to information decomposition

Aaron J. Gutknecht, Fernando E. Rosas, David A. Ehrlich, Abdullah Makkeh, Pedro A. M. Mediano, Michael Wibral

TL;DR

The paper tackles the challenge of analyzing high-order information processing in large distributed systems where traditional partial information decomposition (PID) struggles with scalability and interpretation. It introduces Shannon invariants—entropy-based aggregates such as the average degree of redundancy $\bar{r}$ and the average degree of vulnerability $\bar{v}$—that can be computed efficiently and relate to existing metrics like RSI and a new dual metric, DRSI. By linking source-level (redundancy vs. synergy) and robustness-vulnerability perspectives, the authors provide a principled, scalable framework (including the redundancy lattice and information-atom accounting) for interpreting multivariate information. They demonstrate practical utility by applying the framework to deep learning models (a feedforward MNIST classifier and a face autoencoder), revealing layer-wise information signatures and training dynamics that inform interpretability and robustness considerations.

Abstract

Distributed systems, such as biological and artificial neural networks, process information via complex interactions engaging multiple subsystems, resulting in high-order patterns with distinct properties across scales. Investigating how these systems process information remains challenging due to difficulties in defining appropriate multivariate metrics and ensuring their scalability to large systems. To address these challenges, we introduce a novel framework based on what we call "Shannon invariants" -- quantities that capture essential properties of high-order information processing in a way that depends only on the definition of entropy and can be efficiently calculated for large systems. Our theoretical results demonstrate how Shannon invariants can be used to resolve long-standing ambiguities regarding the interpretation of widely used multivariate information-theoretic measures. Moreover, our practical results reveal distinctive information-processing signatures of various deep learning architectures across layers, which lead to new insights into how these systems process information and how this evolves during training. Overall, our framework resolves fundamental limitations in analyzing high-order phenomena and offers broad opportunities for theoretical developments and empirical analyses.

Shannon invariants: A scalable approach to information decomposition

TL;DR

The paper tackles the challenge of analyzing high-order information processing in large distributed systems where traditional partial information decomposition (PID) struggles with scalability and interpretation. It introduces Shannon invariants—entropy-based aggregates such as the average degree of redundancy and the average degree of vulnerability —that can be computed efficiently and relate to existing metrics like RSI and a new dual metric, DRSI. By linking source-level (redundancy vs. synergy) and robustness-vulnerability perspectives, the authors provide a principled, scalable framework (including the redundancy lattice and information-atom accounting) for interpreting multivariate information. They demonstrate practical utility by applying the framework to deep learning models (a feedforward MNIST classifier and a face autoencoder), revealing layer-wise information signatures and training dynamics that inform interpretability and robustness considerations.

Abstract

Distributed systems, such as biological and artificial neural networks, process information via complex interactions engaging multiple subsystems, resulting in high-order patterns with distinct properties across scales. Investigating how these systems process information remains challenging due to difficulties in defining appropriate multivariate metrics and ensuring their scalability to large systems. To address these challenges, we introduce a novel framework based on what we call "Shannon invariants" -- quantities that capture essential properties of high-order information processing in a way that depends only on the definition of entropy and can be efficiently calculated for large systems. Our theoretical results demonstrate how Shannon invariants can be used to resolve long-standing ambiguities regarding the interpretation of widely used multivariate information-theoretic measures. Moreover, our practical results reveal distinctive information-processing signatures of various deep learning architectures across layers, which lead to new insights into how these systems process information and how this evolves during training. Overall, our framework resolves fundamental limitations in analyzing high-order phenomena and offers broad opportunities for theoretical developments and empirical analyses.

Paper Structure

This paper contains 17 sections, 11 theorems, 43 equations, 4 figures, 3 tables.

Key Result

Proposition 1

The average degree of redundancy is a Shannon-invariant of the distribution $p_{\bm X,Y}$, and its value can be calculated as

Figures (4)

  • Figure 1: Illustration for $n=3$ of the grouping of atoms in terms of their a) degree of redundancy, and b), degree of vulnerability.
  • Figure 2: In a deep classification network, the degree of redundancy of the layer's activity about the label increases throughout the hidden layers wile the degree of vulnerability decreases throughout the hidden layers and over training. a) The architecture of the MNIST classification network comprises five fully-connected hidden layers with activation values quantized to eight levels, three of which have the same size of $5$ neurons. b) After $10^4$ training epochs, the classifier reaches an average train set accuracy of $99.89(3)\%$ and a test set accuracy of $95.5(3)\%$. In all plots, lines represent the median, shaded regions the maximum and minimum of $10$ runs with random weight initializations. c), d) Degree of redundancy and degree of vulnerability computed on the training set for the three equal-sized hidden layers.
  • Figure 3: In a convolutional autoencoder, the degree of redundancy about the input is larger in decoder layers than in the size-matched encoder layers. Furthermore, the degree of redundancy increases with bottleneck size while the degree of vulnerability decreases. a) The architecture of the face image autoencoder comprises a three-layer convolutional encoder, a fully-connected bottleneck layers with varying number of neurons $n_\mathrm{b}$ and a three-layer convolutional decoder. Numbers below the layers reflect the size of the activation matrices as well as the number of channels. b) For a bottleneck size of $n_\mathrm{b}=128$, the mean square error loss on the test set converges to $9.5(1)\times 10^{-3}$ after $10^3$ epochs. Five original images (upper row) and their reconstruction (lower row) are shown for the converged networks. In all plots, lines represent the median, shaded regions the maximum and minimum of $10$ runs with random weight initializations. c) Degree of redundancy computed on the training set, all activations from all convolutional filters have been treated as individual source variables. The degree of vulnerability (not shown) is equal to zero up to numerical error for all layers for a bottleneck layer of width $n_\mathrm{b}=128$. d), e) Degree of redundancy and degree of vulnerability of the bottleneck layer for varying small bottleneck sizes $n_\mathrm{b}$. The inset in E shows an enlarged view of the degree of vulnerability.
  • Figure 4: Lattice of PID atoms for $n=3$ source variables (left). Corresponding representation of PID atoms in information diagrams (right).

Theorems & Definitions (20)

  • Definition 1
  • Definition 2
  • Proposition 1
  • proof
  • Definition 3
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Corollary 1
  • ...and 10 more