Higher-order Common Information
Jan Østergaard
TL;DR
This work defines higher-order common information $R_\ell$ to quantify what multiple arbitrarily distributed variables share, addressing the inadequacy of pairwise metrics like $R_2$ for $n>2$. It introduces sufficient common information (SCI) and a hierarchy of minimal SCI sets $\mathcal{T}_\ell$, proving lower bounds $R_\ell \ge \min_{\phi\in\mathcal{T}_{\ell-1}^u,\phi'\in\mathcal{T}_1} I(\phi;\phi')$ and deriving computable bounds for Gaussian and discrete sources. A practical estimation pipeline for general sources is proposed, combining mutual information estimators with SCI-based constructions (e.g., adding controlled Gaussian noise to form $T_{i,j}$) and extending to higher orders via iterative SCI construction. The method is validated on EEG data from a two-speech listening task, where third-order common information $R_3$ correlates with neural tracking of the attended envelope, offering insight beyond $R_2$ and suggesting utility for uncovering latent dependencies in time-series data.
Abstract
We present a new notion $R_\ell$ of higher-order common information, which quantifies the information that $\ell\geq 2$ arbitrarily distributed random variables have in common. We provide analytical lower bounds on $R_3$ and $R_4$ for jointly Gaussian distributed sources and provide computable lower bounds for $R_\ell$ for any $\ell$ and any sources. We also provide a practical method to estimate the lower bounds on, e.g., real-world time-series data. As an example, we consider EEG data acquired in a setup with competing acoustic stimuli. We demonstrate that $R_3$ has descriptive properties that is not in $R_2$. Moreover, we observe a linear relationship between the amount of common information $R_3$ communicated from the acoustic stimuli and to the brain and the corresponding cortical activity in terms of neural tracking of the envelopes of the stimuli.
