THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

Laouen Belloli; Pedro Mediano; Rodrigo Cofré; Diego Fernandez Slezak; Rubén Herzog

THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

Laouen Belloli, Pedro Mediano, Rodrigo Cofré, Diego Fernandez Slezak, Rubén Herzog

TL;DR

THOI introduces a fast, accessible library for computing higher-order interactions in complex systems by coupling Gaussian copula entropies with batch and parallel processing in PyTorch. It enables exhaustive and heuristic HOI analyses across moderate to large variable counts, delivering $TC$, $DTC$, $\Omega$, and $S$-information in scalable workflows and on standard hardware. The work validates THOI on synthetic probabilistic graphical models, fMRI data under wakefulness and anesthesia, and a large 920-dataset benchmark, demonstrating substantial speedups, memory efficiency, and broad applicability. This approach lowers barriers to multi-variable interdependencies analysis and supports population-level studies, with potential impact in neuroscience, economics, ecology, and beyond.

Abstract

Complex systems are characterized by nonlinear dynamics, multi-level interactions, and emergent collective behaviors. Traditional analyses that focus solely on pairwise interactions often oversimplify these systems, neglecting the higher-order interactions critical for understanding their full collective dynamics. Recent advances in multivariate information theory provide a principled framework for quantifying these higher-order interactions, capturing key properties such as redundancy, synergy, shared randomness, and collective constraints. However, two major challenges persist: accurately estimating joint entropies and addressing the combinatorial explosion of interacting terms. To overcome these challenges, we introduce THOI (Torch-based High-Order Interactions), a novel, accessible, and efficient Python library for computing high-order interactions in continuous-valued systems. THOI leverages the well-established Gaussian copula method for joint entropy estimation, combined with state-of-the-art batch and parallel processing techniques to optimize performance across CPU, GPU, and TPU environments. Our results demonstrate that THOI significantly outperforms existing tools in terms of speed and scalability. For larger systems, where exhaustive analysis is computationally impractical, THOI integrates optimization strategies that make higher-order interaction analysis feasible. We validate THOI accuracy using synthetic datasets with parametrically controlled interactions and further illustrate its utility by analyzing fMRI data from human subjects in wakeful resting states and under deep anesthesia. Finally, we analyzed over 900 real-world and synthetic datasets, establishing a comprehensive framework for applying higher-order interaction (HOI) analysis in complex systems.

THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

TL;DR

, and

-information in scalable workflows and on standard hardware. The work validates THOI on synthetic probabilistic graphical models, fMRI data under wakefulness and anesthesia, and a large 920-dataset benchmark, demonstrating substantial speedups, memory efficiency, and broad applicability. This approach lowers barriers to multi-variable interdependencies analysis and supports population-level studies, with potential impact in neuroscience, economics, ecology, and beyond.

Abstract

Paper Structure (46 sections, 22 equations, 6 figures)

This paper contains 46 sections, 22 equations, 6 figures.

Introduction
Results
Leveraging Gaussian entropy estimators with batch processing
Enhanced Performance
HOI_toolbox (Higher-Order Interactions Toolbox)
HOI (High-Performance Estimation of Higher-Order Interactions)
JIDT (Java Information Dynamics Toolkit)
Optimization for larger systems: greedy and simulated annealing algorithms
Analysis of human brain activity under anesthesia
Analysis of large database of synthetic and real-world systems
Methods
Multivariate information theory
Generalizations of the mutual information
Efficient computation of HOI via Gaussian copulas
Parallel and batched computation of HOI
...and 31 more sections

Figures (6)

Figure 1: Efficient computation of HOI using batch processing of covariance matrices.A) A set of multivariate time series $X$ is transformed using the Gaussian copula approach, generating covariance matrices $\Sigma$ for each dataset. The covariance matrices are then sub-sampled using a batch of $k$-plet indices, defined by a binary mask applied to $\Sigma$, yielding the sub-covariance matrices $\Sigma^k$ for each k-plet. These sub-covariance matrices are then batched together, and their determinants are computed, which are subsequently used to calculate the entropies and associated HOI defined by the $DTC$, $TC$, $\Omega$, and $S$-information metrics, where the entropy is computed from the determinants of single variables $H(\Sigma^k_j)$, the whole system $H(\Sigma^k)$ and the whole system without a single variable $H(\Sigma^k_{-j})$ (see \ref{['sup:generalization_of_MI']} for detailed descriptions) . Finally, batches are pre-processed using a custom function (e.g., extracting the minimum $\Omega$), and the results are aggregated to produce the final output. Note that multiple datasets with identical system and sample sizes can be processed simultaneously and the batch management system allows flexible analysis on the fly. B) Computational time versus order of interactions for a 30-variable system with 1000 samples. The THOI method successfully computes all possible HOI in less than 6 hours, whereas other libraries are unable to process interactions beyond order 11 within the same time frame. C) Log-log plot of computational time as a function of sample size for a 20-variable system. All libraries exhibit logarithmic scaling, but THOI outperforms the others in terms of computational speed.
Figure 2: Within-order optimization with greedy and simulated annealing algorithmsA, B) Maximum (red) and minimum (blue) $\Omega$ obtained by greedy and SA algorithms for a $100$-variable system composed of strong/weak R and S systems and an independent system, each with 20 variables. Dashed horizontal lines indicate ground truth for the weak systems, and dotted lines represent the sum of weak and strong systems (red for R, blue for S). Both algorithms successfully identify the systems, but greedy required $100$ times more repeats than SA. C, D) Subsets of variables that maximize $\Omega$ at each order of interaction for greedy and SA. Systems are color-coded (weak R: light red; strong R: dark red; weak S: light blue; strong S: dark blue; independent: gray). Both algorithms prioritize strong R, then weak R, followed by a mix of independent and S systems. E, F) Subsets of variables that minimize $\Omega$. Both algorithms prioritize strong S, then weak S, followed by a combination of independent and R systems, with a preference for the former.
Figure 3: A) Estimated maximum and minimum $\Omega$ via the GA for awake (green) and deep anesthesia state (purple). Inset shows the reduction of minimum $\Omega$ at lower orders of interaction. B) Average maximum (red) and minimum (blue) effect size obtained from a GA tailored to amplify the difference between the two conditions. Shaded areas denote the range from the minimum to the maximum value for each optimization procedure. C) Distribution of $\Omega$ for the whole-brain in awake (green) and deep anesthesia state (purple). Each dot is a subject and lines connect their respective value in both conditions. Despite the trend to reduce redundancy, no significant difference was found (Wilcoxon $p>0.001$) D) Distribution of the $n$-plets that maximizes (left) and minimizes (right) the effect size obtained by the GA. E) Same as D, but for the $n$-plets obtained with the SA algorithm. Order is the number of elements in $n$-plets, $p$ is the Wilcoxon p-value and $d$ is the Cohen's $d$.
Figure 4: A) Spearman correlation matrix of features across datasets. Colors code different types of features. 'prop. sy$n$-plets' is the proportion of synergy-dominated $n$-plets out of the total number of $n$-plets. 'order max $\Omega$' and 'order min $\Omega$' is the order where $\Omega$ was maximized and minimized, respectively, normalized by the system size. 'mean MI' and 'std MI' are the mean and standard deviation of the pairwise mutual information for each dataset. The prefix 'whole' indicates that the whole system was considered (i.e. all the system variables). B) Cumulative explained variance associated with each PC after PCA. The first four components capture approximately 95% of the variance. C) Values of the first four PCs on each feature. Colors are the same as in A. PC1 captures the overall interdependencies, by grouping together all the 'max', 'mean', 'whole' and 'MI' related features. PC2 captures the overall independence by grouping together all the 'min' related features. PC3 captures the proportion of synergy-dominated interaction by grouping together 'prop. syn-plets' and the order at which $\Omega$ is maximized and minimized. PC4 captures the behavior of $\Omega$, by grouping together its maximum, minimum, mean and whole-system values.
Figure 5: Sub-covariance matrices sampled with padding to allow different covariance matrix sizes in a single batch. 1) First, a mask is applied to the full covariance matrices using a masked encoding of the $n$-plets (each with a different number of masked variables) to obtain each sub-covariance matrix. At this point, the obtained covariance matrices are invalid as the masked rows and columns have zeros on the diagonal, yielding a constant distribution. 2) Then, an identity matrix is masked with the inverted $n$-plet encodings. 3) Both masked matrices are added to obtain the final covariance matrix where the rows and columns of the $n$-plet have the values from the full covariance matrix, and the remaining rows and columns have ones on the diagonal and zeros elsewhere, representing an independent standard normal component.
...and 1 more figures

THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

TL;DR

Abstract

THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)