Information Theory for Complex Systems Scientists

Thomas F. Varley

Information Theory for Complex Systems Scientists

Thomas F. Varley

TL;DR

This review argues that complex systems—with their nonlinear, multiscale interactions—benefit from information-theoretic tools as a universal language for describing uncertainty, dependencies, and computation. It surveys core measures (entropy, relative entropy, mutual information) and extends to multivariate and dynamic contexts (total/Dual total correlation, co-information, PID, PED, GID, and $\Phi$ID) to dissect redundant, unique, and synergistic information. It then connects these measures to network inference, highlighting functional and effective connectivity, higher-order frameworks (hypergraphs, simplicial complexes), and complexity concepts (TSE complexity, O- and S-information, integrated information). The article also covers practical estimation methods for discrete and continuous data, formal software tools, and the limitations of applying information theory in real-world data, ultimately advocating a problem-driven use of complexity measures and a broad, integrative information-theoretic toolkit for future complex-systems science.

Abstract

In the 21st century, many of the crucial scientific and technical issues facing humanity can be understood as problems associated with understanding, modelling, and ultimately controlling complex systems: systems comprised of a large number of non-trivially interacting components whose collective behaviour can be difficult to predict. Information theory, a branch of mathematics historically associated with questions about encoding and decoding messages, has emerged as something of a lingua franca for those studying complex systems, far exceeding its original narrow domain of communication systems engineering. In the context of complexity science, information theory provides a set of tools which allow researchers to uncover the statistical and effective dependencies between interacting components; relationships between systems and their environment; mereological whole-part relationships; and is sensitive to non-linearities missed by commonly parametric statistical models. In this review, we aim to provide an accessible introduction to the core of modern information theory, aimed specifically at aspiring (and established) complex systems scientists. This includes standard measures, such as Shannon entropy, relative entropy, and mutual information, before building to more advanced topics, including: information dynamics, measures of statistical complexity, information decomposition, and effective network inference. In addition to detailing the formal definitions, in this review we make an effort to discuss how information theory can be interpreted and develop the intuition behind abstract concepts like "entropy," in the hope that this will enable interested readers to understand what information is, and how it is used, at a more fundamental level.

Information Theory for Complex Systems Scientists

TL;DR

ID) to dissect redundant, unique, and synergistic information. It then connects these measures to network inference, highlighting functional and effective connectivity, higher-order frameworks (hypergraphs, simplicial complexes), and complexity concepts (TSE complexity, O- and S-information, integrated information). The article also covers practical estimation methods for discrete and continuous data, formal software tools, and the limitations of applying information theory in real-world data, ultimately advocating a problem-driven use of complexity measures and a broad, integrative information-theoretic toolkit for future complex-systems science.

Abstract

Paper Structure (63 sections, 119 equations, 4 figures, 3 tables)

This paper contains 63 sections, 119 equations, 4 figures, 3 tables.

What Are Complex Systems?
What Is Information?
Entropy
Entropy as Expected Surprise
Entropy as Required Information
Joint Entropy
Conditional Entropy
Relative Entropy
Local Relative Entropy
Mutual Information
Local Mutual Information
Conditional mutual information
Multivariate Generalization of Mutual Information
Total Correlation
Dual Total Correlation
...and 48 more sections

Figures (4)

Figure 1: An entropy diagram showing how the marginal, conditional, and joint entropies are related the mutual information for two interacting variables $X$ and $Y$. The area of each circle corresponds to the amount of uncertainty we have about the state of each variable not the amount of information we have (this is a common misinterpretation). Each circle corresponds to our total uncertainty about $X$ or $Y$ respectively. The intersection of the Venn diagram is that uncertainty that is common to both variables. If we were to resolve all of our uncertainty about $Y$, we would also be resolving some uncertainty about $X$: this overlap is the mutual information $I(X;Y)$. We can also see the conditional entropy is that uncertainty specific to one variable that is left over after resolving the uncertainty specific to the other. Finally, the joint entropy is given by the union of both marginal entropies. From this diagram we can get a visual intuition for the various definitions of mutual information: it's clear that $I(X;Y) = H(X) + H(Y) - H(X,Y) = H(X,Y) - H(Y|X) - H(X|Y)$.
Figure 2: This Venn diagram generalizes the relationships introduced in Figure \ref{['fig:mi_venn']}. However, care should be taken when considering entropy diagrams for more than two variables: the innermost intersection ($I(X_1;X_2;X_3)$, corresponding to the co-information or interaction information) is not strictly positive matsuda_physical_2000. The sign refers to whether the trivariate relationship is redundancy or synergy dominated williams_nonnegative_2010, discussed below.
Figure 3: A Venn Diagram showing how the various components of partial information (redundant, unique, and synergistic) are related to the joint and marginal mutual information terms for two source variables $X_1$ and $X_2$, and a target variable $Y$. The two circles correspond to the mutual information between each source and the target, while the large ellipse gives the joint mutual information between both sources and the target. These diagrams highlight the difference between the marginal mutual information and the unique information: notice that the marginal mutual informations overlap, each one counting the redundant (shared) information towards it's own marginal mutual information. We can also see that $I(X_1,X_2;Y) > I(X_1;Y)\cup I(X_2;Y)$: the difference is the synergistic information which cannot be resolved to either marginal mutual information. Finally, the Venn Diagram highlights how the partial information terms relate to mutual information terms: for example: $Syn(X_1,X_2;Y) = I(X_1,X_2;Y) - I(X_1;Y) - I(X_2;Y) + Red(X_1,X_2;Y)$. We have to add a redundancy term back in because it is "double counted" when subtracting off the marginal mutual informations.
Figure 4: Examples of redundancy lattices for the two simplest possible systems. Left: The redundancy lattice for a set of two sources $X_1$ and $X_2$ synapsing onto a single target. This is a simplified visualization of the Venn Diagram seen in Fig. \ref{['fig:pid_venn']}: $\{1\}\{2\}$ corresponds to the information redundantly shared between $X_1$ and $X_2$, while $\{12\}$ corresponds to the synergistic information and the single elements $\{1\}$ and $\{2\}$ indicate the unique information in each element. Right: The redundancy lattice for three sources synapsing onto a single target. The three-element lattice makes it clear that, as the number of sources grows, the clean distinctions between "redundancy", "unique information" and "synergy" break down as more complex combinations of sources contribute information about the target. The top and bottom of the lattice can be thought of as "purely synergistic" and "purely redundant" respectively, however between the two extremes, the "PI-atoms" can be thought of as information that is redundantly shared over higher-order combinations of sources: for example $\{1\}\{23\}$ gives the partial information that is redundantly present in both $X_1$ and the joint states of $X_2$ and $X_3$ together.

Information Theory for Complex Systems Scientists

TL;DR

Abstract

Information Theory for Complex Systems Scientists

Authors

TL;DR

Abstract

Table of Contents

Figures (4)