Table of Contents
Fetching ...

Information-Theoretic Measures on Lattices for High-Order Interactions

Zhaolu Liu, Mauricio Barahona, Robert L. Peach

TL;DR

This work addresses the challenge of measuring higher-order interactions in multivariate data, where traditional pairwise or limited-factorisation measures fall short for $d>3$. It develops a lattice-theoretic framework that unifies existing measures and introduces Streitberg Information (SI) defined on the full partition lattice, using the Tsallis-Alpha divergence $D_{\alpha}$ with $\alpha\in(0,1)$ to preserve complex interaction structure; a $k$NN-based estimator provides a consistent, nonparametric approach. The paper shows that SI vanishes exactly when the joint distribution factorises through lower-order terms, validates SI on synthetic datasets including MVGs, XOR, and COPY gates, and demonstrates applicability to stock market interactions, neural decoding, and feature selection. Overall, SI offers a scalable, permutation-invariant tool for detecting and quantifying higher-order statistical dependencies with potential implications for causality and model interpretation.

Abstract

Traditional measures based solely on pairwise associations often fail to capture the complex statistical structure of multivariate data. Existing approaches for identifying information shared among $d>3$ variables are frequently computationally intractable, asymmetric with respect to a target variable, or unable to account for all the ways in which the joint probability distribution can be factorised. Here we present a systematic framework based on lattice theory to derive higher-order information-theoretic measures for multivariate data. Our construction uses lattice and operator function pairs, whereby an operator function is applied over a lattice that represents the algebraic relationships among variables. We show that many commonly used measures can be derived within this framework, yet they fail to capture all interactions for $d>3$, either because they are defined on restricted sublattices, or because the use of the KL divergence as an operator function, a typical choice, leads to undesired disregard of groups of interactions. To fully characterise all interactions among $d$ variables, we introduce the Streitberg Information, which is defined over the full partition lattice and uses generalised divergences (beyond KL) as operator functions. We validate the Streitberg Information on synthetic data, and illustrate its application in detecting complex interactions among stocks, decoding neural signals, and performing feature selection in machine learning.

Information-Theoretic Measures on Lattices for High-Order Interactions

TL;DR

This work addresses the challenge of measuring higher-order interactions in multivariate data, where traditional pairwise or limited-factorisation measures fall short for . It develops a lattice-theoretic framework that unifies existing measures and introduces Streitberg Information (SI) defined on the full partition lattice, using the Tsallis-Alpha divergence with to preserve complex interaction structure; a NN-based estimator provides a consistent, nonparametric approach. The paper shows that SI vanishes exactly when the joint distribution factorises through lower-order terms, validates SI on synthetic datasets including MVGs, XOR, and COPY gates, and demonstrates applicability to stock market interactions, neural decoding, and feature selection. Overall, SI offers a scalable, permutation-invariant tool for detecting and quantifying higher-order statistical dependencies with potential implications for causality and model interpretation.

Abstract

Traditional measures based solely on pairwise associations often fail to capture the complex statistical structure of multivariate data. Existing approaches for identifying information shared among variables are frequently computationally intractable, asymmetric with respect to a target variable, or unable to account for all the ways in which the joint probability distribution can be factorised. Here we present a systematic framework based on lattice theory to derive higher-order information-theoretic measures for multivariate data. Our construction uses lattice and operator function pairs, whereby an operator function is applied over a lattice that represents the algebraic relationships among variables. We show that many commonly used measures can be derived within this framework, yet they fail to capture all interactions for , either because they are defined on restricted sublattices, or because the use of the KL divergence as an operator function, a typical choice, leads to undesired disregard of groups of interactions. To fully characterise all interactions among variables, we introduce the Streitberg Information, which is defined over the full partition lattice and uses generalised divergences (beyond KL) as operator functions. We validate the Streitberg Information on synthetic data, and illustrate its application in detecting complex interactions among stocks, decoding neural signals, and performing feature selection in machine learning.
Paper Structure (34 sections, 4 theorems, 27 equations, 13 figures, 3 tables)

This paper contains 34 sections, 4 theorems, 27 equations, 13 figures, 3 tables.

Key Result

Lemma 1

$L(d)$ is isomorphic to $B(d)$ with the exclusion of singleton elements (deatomisation).

Figures (13)

  • Figure 1: Lattice embeddings. The black dots indicate the marginal distributions of the singletons ($p_i$). The line, triangular and square shapes represent the joint distribution of two, three and four variables, respectively. (a) The two-element chain is isomorphic for all $d$. (b) The deatomised $B(3)$ is isomorphic to $L(3)$, and elements in $L(3)$ can be mapped to the non-shaded elements in $B(3)$. $L(3)$ is a sublattice of $P(3)$ (in this case equal). (c) For $d=4$, we see that the deatomised $B(4)$ is isomorphic to $L(4)$, which is a (strictly smaller) sublattice of $P(4)$.
  • Figure 2: Robustness of Streitberg information ($d=4$). All calculations in this figure are for data from a MVG with covariance matrix $\Sigma^1 (\rho)$, where $\rho$ indicates the interaction strength. (a) The behaviour of $\mathrm{SI}(4)$ as a function of $\rho$ is consistent across $\alpha$. (b) Increasing the number of neighbours $k$ improves the estimation accuracy. (c) Increasing the sample size $n$ improves the estimation accuracy for a fixed number of neighbours ($k=5$).
  • Figure 3: Validation of Streitberg information. (a) $\mathrm{TC}(4)$ fails to vanish for $p_{1}p_{234}$, while (b) both $\mathrm{TC}(4)$ and $\mathrm{LI}(4)$ fail to characterise $p_{12}p_{34}$. $\mathrm{SI}(4)$ correctly vanishes in both cases. (c) The magnitude of $\mathrm{SI}(4)$ is influenced by the extent to which the joint distribution can be factorised. (d) Streitberg Information exhibits monotonic behaviour consistently across varying types of interaction, and the magnitude indicates again the difficulty of factorising the joint distribution.
  • Figure 4: $d$-order Streitberg information in stock returns vary within and across sectors. (a) $\mathrm{SI}(d)$ was computed between daily returns of stocks from 2010-2024 within sectors (coloured bars) and across random sectors as baseline (grey bars) for $d=2,3,4$. Each bar represents the average magnitude of $\mathrm{SI}(d)$ across $500$ samples of $d$ stocks. Bars with deeper colours (resp. shaded) have significantly larger (resp. smaller) $\mathrm{SI}(d)$ than the inter-sector grey bars. Bars with pale colours display non-significant differences with respect to the inter-sector baseline. (b) Log ratio between the $\mathrm{SI}(d)$ within each sector and the random baseline across sectors for stock returns pre- and post-January 2020 (COVID onset). We report significance of two-sample t-test of $\mathrm{SI}(d)$ within sector vs. between sector (*, $p<0.05$; **, $p<0.01$; ***, $p<0.001$; ****, $p<0.0001$; n.s., not significant). There is an increase in significantly different values of $\mathrm{SI}(4)$ post-COVID across sectors.
  • Figure 5: Streitberg information significantly improves the decoding of preparatory stage vs. motor action in neural activity. Classifiers that include higher-order features from $\mathrm{SI}$ , $\mathrm{LI}$ and $\mathrm{TC}$ are trained to identify trial stage (preparation vs. action). We report the significance of paired t-tests between model accuracies across sessions (*, $p<0.05$; **, $p<0.005$; ns, not significant). Adding SI features ($d=3,4$) improves classifier accuracy, whereas adding higher-order LI and TC features induces no significant improvement in accuracy compared to the pairwise model ('2').
  • ...and 8 more figures

Theorems & Definitions (15)

  • Definition 1: Chain
  • Definition 2: Boolean lattice
  • Definition 3: Partition lattice
  • Lemma 1
  • proof
  • Definition 4
  • Proposition 1
  • proof
  • Lemma 2
  • proof
  • ...and 5 more