Table of Contents
Fetching ...

S$Ω$I: Score-based O-INFORMATION Estimation

Mustapha Bounoua, Giulio Franzese, Pietro Michiardi

TL;DR

The paper addresses the challenge of estimating high-order information measures in multivariate systems without restrictive distributional assumptions. It introduces Score-based O-information Estimation ($S\Omega I$), which uses score-based divergences derived from a noised diffusion process to estimate O-information and its gradient via a single amortized denoising network within a VP-SDE framework. Through extensive synthetic benchmarks and a real neural dataset, it demonstrates accurate, scalable estimation of redundancy and synergy across complex, high-dimensional systems, outperforming baseline pairwise MI decompositions. The approach enables nuanced analysis of variable-level contributions and applies to real-world data such as neural recordings during visually guided tasks, highlighting practical impact for neuroscience and beyond.

Abstract

The analysis of scientific data and complex multivariate systems requires information quantities that capture relationships among multiple random variables. Recently, new information-theoretic measures have been developed to overcome the shortcomings of classical ones, such as mutual information, that are restricted to considering pairwise interactions. Among them, the concept of information synergy and redundancy is crucial for understanding the high-order dependencies between variables. One of the most prominent and versatile measures based on this concept is O-information, which provides a clear and scalable way to quantify the synergy-redundancy balance in multivariate systems. However, its practical application is limited to simplified cases. In this work, we introduce S$Ω$I, which allows for the first time to compute O-information without restrictive assumptions about the system. Our experiments validate our approach on synthetic data, and demonstrate the effectiveness of S$Ω$I in the context of a real-world use case.

S$Ω$I: Score-based O-INFORMATION Estimation

TL;DR

The paper addresses the challenge of estimating high-order information measures in multivariate systems without restrictive distributional assumptions. It introduces Score-based O-information Estimation (), which uses score-based divergences derived from a noised diffusion process to estimate O-information and its gradient via a single amortized denoising network within a VP-SDE framework. Through extensive synthetic benchmarks and a real neural dataset, it demonstrates accurate, scalable estimation of redundancy and synergy across complex, high-dimensional systems, outperforming baseline pairwise MI decompositions. The approach enables nuanced analysis of variable-level contributions and applies to real-world data such as neural recordings during visually guided tasks, highlighting practical impact for neuroscience and beyond.

Abstract

The analysis of scientific data and complex multivariate systems requires information quantities that capture relationships among multiple random variables. Recently, new information-theoretic measures have been developed to overcome the shortcomings of classical ones, such as mutual information, that are restricted to considering pairwise interactions. Among them, the concept of information synergy and redundancy is crucial for understanding the high-order dependencies between variables. One of the most prominent and versatile measures based on this concept is O-information, which provides a clear and scalable way to quantify the synergy-redundancy balance in multivariate systems. However, its practical application is limited to simplified cases. In this work, we introduce SI, which allows for the first time to compute O-information without restrictive assumptions about the system. Our experiments validate our approach on synthetic data, and demonstrate the effectiveness of SI in the context of a real-world use case.
Paper Structure (47 sections, 4 theorems, 33 equations, 22 figures, 1 table, 2 algorithms)

This paper contains 47 sections, 4 theorems, 33 equations, 22 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

franzese2023mindekong2023interpretable The divergence between two generic distributions $p(x)$ and $q(x)$, defined as can be computed considering the time-varying score functions $\nabla \log(p_t)$ and $\nabla \log(q_t)$, according to the following expression:

Figures (22)

  • Figure 1: Redundant system with $N=$10 variables, organized into subsets of sizes $\{3,3,4\}$ and increasing interaction strength.
  • Figure 2: Synergistic system with $N=$10 variables, organized into subsets of sizes $\{3,3,4\}$ and increasing interaction strength.
  • Figure 3: Mixed-interaction system with $N=$10 variables, organized into 2 redundancy-dominant subsets of size $\{3,4\}$ variables and one synergy-dominant subset with $3$ variables. O-information is modulated by fixing the synergy inter-dependency and increasing the redundancy.
  • Figure 4: Gradient of O-information for the mixed benchmark, for a system of $N=$6 variables, and a system of $N=$10 variables, and different dimension of variables.
  • Figure 5: O-information estimate in the visual cortex region activity after two types of stimulus flash across 72 trial sessions. Top: Analysis using three brain region areas, Bottom: Extended analysis using six brain region areas. The step size is set to $2ms$ which results in 25 dimensional data for each bin per area. Different step sizes led to the same behavior (see \ref{['additional']}).
  • ...and 17 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4