Table of Contents
Fetching ...

Surveying the space of descriptions of a composite system with machine learning

Kieran A. Murphy, Yujing Zhang, Dani S. Bassett

TL;DR

The paper reframes multivariate information theory by exploring the continuous space of per-component descriptions of a composite system, treating each description as a channel $U_i$ that encodes information about $X_i$ and optimizing these channels to expose global structure via $TC$ and $\Omega$. It develops a neural-network framework that constrains the total transmitted information $\sum_i I(U_i;X_i)$ while using InfoNCE to approximate remaining mutual-information terms, and employs an adversarial setup to minimize or maximize $I(\mathbf{U};\mathbf{X})$ as required. Through case studies on a 5-spin Ising system, a 4x4 Sudoku board, and 4-gram language statistics, the method identifies extremal descriptions that reveal how system-wide variation arises from component-level variation and can be interpreted via hardened, discrete descriptions. The approach scales to real-world data, remains flexible for continuous variables and diverse information-theoretic targets, and provides a practical toolkit for probing the structure of complex systems by navigating the space of descriptions, with potential to extend beyond $TC$ and $\Omega$ to other information-theoretic quantities.

Abstract

Multivariate information theory provides a general and principled framework for understanding how the components of a complex system are connected. Existing analyses are coarse in nature -- built up from characterizations of discrete subsystems -- and can be computationally prohibitive. In this work, we propose to study the continuous space of possible descriptions of a composite system as a window into its organizational structure. A description consists of specific information conveyed about each of the components, and the space of possible descriptions is equivalent to the space of lossy compression schemes of the components. We introduce a machine learning framework to optimize descriptions that extremize key information theoretic quantities used to characterize organization, such as total correlation and O-information. Through case studies on spin systems, sudoku boards, and letter sequences from natural language, we identify extremal descriptions that reveal how system-wide variation emerges from individual components. By integrating machine learning into a fine-grained information theoretic analysis of composite random variables, our framework opens a new avenues for probing the structure of real-world complex systems.

Surveying the space of descriptions of a composite system with machine learning

TL;DR

The paper reframes multivariate information theory by exploring the continuous space of per-component descriptions of a composite system, treating each description as a channel that encodes information about and optimizing these channels to expose global structure via and . It develops a neural-network framework that constrains the total transmitted information while using InfoNCE to approximate remaining mutual-information terms, and employs an adversarial setup to minimize or maximize as required. Through case studies on a 5-spin Ising system, a 4x4 Sudoku board, and 4-gram language statistics, the method identifies extremal descriptions that reveal how system-wide variation arises from component-level variation and can be interpreted via hardened, discrete descriptions. The approach scales to real-world data, remains flexible for continuous variables and diverse information-theoretic targets, and provides a practical toolkit for probing the structure of complex systems by navigating the space of descriptions, with potential to extend beyond and to other information-theoretic quantities.

Abstract

Multivariate information theory provides a general and principled framework for understanding how the components of a complex system are connected. Existing analyses are coarse in nature -- built up from characterizations of discrete subsystems -- and can be computationally prohibitive. In this work, we propose to study the continuous space of possible descriptions of a composite system as a window into its organizational structure. A description consists of specific information conveyed about each of the components, and the space of possible descriptions is equivalent to the space of lossy compression schemes of the components. We introduce a machine learning framework to optimize descriptions that extremize key information theoretic quantities used to characterize organization, such as total correlation and O-information. Through case studies on spin systems, sudoku boards, and letter sequences from natural language, we identify extremal descriptions that reveal how system-wide variation emerges from individual components. By integrating machine learning into a fine-grained information theoretic analysis of composite random variables, our framework opens a new avenues for probing the structure of real-world complex systems.

Paper Structure

This paper contains 1 section, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Descriptions of a system.(a) The state $\boldsymbol{X}$ of a system of five interacting spins with ferromagnetic and antiferromagnetic couplings (straight and zigzag connectors, respectively) can be described by communicating information about each spin $X_i$. (b) The space of descriptions charted in terms of the total component information, $\sum_i I(X_i;U_i)$, and the system information, $I(\boldsymbol{X};\boldsymbol{U})$. Possible descriptions include the accounting of discrete subsystems---i.e., subsets of components (black circles)---as well as a continuum of compression schemes for each component, which we randomly sample (gray dots) and optimize over (blue trace, standard error visualized). The space of possible descriptions can also be characterized by the total correlation (c), O-information (d), or other quantities from multivariate information theory. (e) Description space in terms of total correlation for the system in panel a at different temperatures. (f) Description space in O-information for various five-spin systems at $k_\text{B}T=0.625$, with the blue trace the system in panel a.
  • Figure 2: The space of descriptions of a 4x4 sudoku board.(a) Discrete subsets of squares (black circles) and machine learning-optimized boundaries (blue circles), in terms of O-information. Optimized soft compression schemes are converted to hard compression schemes (black stars) and visualized according to the corresponding Roman numerals. The hard compression scheme for each square in a board is displayed by coloring numbers according to groupings. For example, if one number in a square is blue and the rest are white, the blue number is distinguishable from the remaining three, and the three are indistinguishable from each other. (b) We randomly sampled $10^6$ hard descriptions within the information range at the top of each plot. The optimized descriptions have O-information values (blue vertical lines) far from the distribution of randomly sampled schemes (grey).
  • Figure 3: Statistical structure in 4-letter sequences. The space of descriptions for 4-grams taken from 4- and 8-letter words, plotted in terms of (a) total correlation and (b) O-information. (c) The hardened descriptions for the maximal total correlation points marked as stars in panel (a), where groupings of letters are separated by vertical bars, and the group with all remaining letters of the alphabet is represented by an asterisk (*). The top contributions $\Delta\text{TC}$ to total correlation are shown at right, with the most probable n-grams inside each grouping shown. Letters are bolded to highlight recognizable letter patterns central to each grouping.
  • Figure S1: The space of descriptions of a 4x4 sudoku board in terms of total correlation. Discrete subsets of squares (black circles) and machine learning-optimized boundaries (blue curves), in terms of total correlation. Optimized (soft) compression schemes are converted to hard compression schemes (black stars) and visualized according to the corresponding Roman numerals. The hard compression scheme for each square in a board is displayed by coloring numbers according to groupings. For example, if one number in a square is blue and the rest are white, the blue number is distinguishable from the remaining three, and the three are indistinguishable from each other.
  • Figure S2: Structure of 4-grams statistics, continued. We reproduce the total correlation (a) and O-information (b) description spaces from Fig. \ref{['fig:words']} in the main text, now with the discrete subsystems for all three datasets (squares), and with hardened descriptions for all extremized quantities at around seven bits of total information (stars). (c) For the 4-letter words, we hardened the descriptions that minimize and maximize total correlation and O-information, and show the top contributing codes.
  • ...and 1 more figures