Table of Contents
Fetching ...

THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

Laouen Belloli, Pedro Mediano, Rodrigo Cofré, Diego Fernandez Slezak, Rubén Herzog

TL;DR

THOI introduces a fast, accessible library for computing higher-order interactions in complex systems by coupling Gaussian copula entropies with batch and parallel processing in PyTorch. It enables exhaustive and heuristic HOI analyses across moderate to large variable counts, delivering $TC$, $DTC$, $\Omega$, and $S$-information in scalable workflows and on standard hardware. The work validates THOI on synthetic probabilistic graphical models, fMRI data under wakefulness and anesthesia, and a large 920-dataset benchmark, demonstrating substantial speedups, memory efficiency, and broad applicability. This approach lowers barriers to multi-variable interdependencies analysis and supports population-level studies, with potential impact in neuroscience, economics, ecology, and beyond.

Abstract

Complex systems are characterized by nonlinear dynamics, multi-level interactions, and emergent collective behaviors. Traditional analyses that focus solely on pairwise interactions often oversimplify these systems, neglecting the higher-order interactions critical for understanding their full collective dynamics. Recent advances in multivariate information theory provide a principled framework for quantifying these higher-order interactions, capturing key properties such as redundancy, synergy, shared randomness, and collective constraints. However, two major challenges persist: accurately estimating joint entropies and addressing the combinatorial explosion of interacting terms. To overcome these challenges, we introduce THOI (Torch-based High-Order Interactions), a novel, accessible, and efficient Python library for computing high-order interactions in continuous-valued systems. THOI leverages the well-established Gaussian copula method for joint entropy estimation, combined with state-of-the-art batch and parallel processing techniques to optimize performance across CPU, GPU, and TPU environments. Our results demonstrate that THOI significantly outperforms existing tools in terms of speed and scalability. For larger systems, where exhaustive analysis is computationally impractical, THOI integrates optimization strategies that make higher-order interaction analysis feasible. We validate THOI accuracy using synthetic datasets with parametrically controlled interactions and further illustrate its utility by analyzing fMRI data from human subjects in wakeful resting states and under deep anesthesia. Finally, we analyzed over 900 real-world and synthetic datasets, establishing a comprehensive framework for applying higher-order interaction (HOI) analysis in complex systems.

THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

TL;DR

THOI introduces a fast, accessible library for computing higher-order interactions in complex systems by coupling Gaussian copula entropies with batch and parallel processing in PyTorch. It enables exhaustive and heuristic HOI analyses across moderate to large variable counts, delivering , , , and -information in scalable workflows and on standard hardware. The work validates THOI on synthetic probabilistic graphical models, fMRI data under wakefulness and anesthesia, and a large 920-dataset benchmark, demonstrating substantial speedups, memory efficiency, and broad applicability. This approach lowers barriers to multi-variable interdependencies analysis and supports population-level studies, with potential impact in neuroscience, economics, ecology, and beyond.

Abstract

Complex systems are characterized by nonlinear dynamics, multi-level interactions, and emergent collective behaviors. Traditional analyses that focus solely on pairwise interactions often oversimplify these systems, neglecting the higher-order interactions critical for understanding their full collective dynamics. Recent advances in multivariate information theory provide a principled framework for quantifying these higher-order interactions, capturing key properties such as redundancy, synergy, shared randomness, and collective constraints. However, two major challenges persist: accurately estimating joint entropies and addressing the combinatorial explosion of interacting terms. To overcome these challenges, we introduce THOI (Torch-based High-Order Interactions), a novel, accessible, and efficient Python library for computing high-order interactions in continuous-valued systems. THOI leverages the well-established Gaussian copula method for joint entropy estimation, combined with state-of-the-art batch and parallel processing techniques to optimize performance across CPU, GPU, and TPU environments. Our results demonstrate that THOI significantly outperforms existing tools in terms of speed and scalability. For larger systems, where exhaustive analysis is computationally impractical, THOI integrates optimization strategies that make higher-order interaction analysis feasible. We validate THOI accuracy using synthetic datasets with parametrically controlled interactions and further illustrate its utility by analyzing fMRI data from human subjects in wakeful resting states and under deep anesthesia. Finally, we analyzed over 900 real-world and synthetic datasets, establishing a comprehensive framework for applying higher-order interaction (HOI) analysis in complex systems.
Paper Structure (46 sections, 22 equations, 6 figures)

This paper contains 46 sections, 22 equations, 6 figures.

Figures (6)

  • Figure 1: Efficient computation of HOI using batch processing of covariance matrices.A) A set of multivariate time series $X$ is transformed using the Gaussian copula approach, generating covariance matrices $\Sigma$ for each dataset. The covariance matrices are then sub-sampled using a batch of $k$-plet indices, defined by a binary mask applied to $\Sigma$, yielding the sub-covariance matrices $\Sigma^k$ for each k-plet. These sub-covariance matrices are then batched together, and their determinants are computed, which are subsequently used to calculate the entropies and associated HOI defined by the $DTC$, $TC$, $\Omega$, and $S$-information metrics, where the entropy is computed from the determinants of single variables $H(\Sigma^k_j)$, the whole system $H(\Sigma^k)$ and the whole system without a single variable $H(\Sigma^k_{-j})$ (see \ref{['sup:generalization_of_MI']} for detailed descriptions) . Finally, batches are pre-processed using a custom function (e.g., extracting the minimum $\Omega$), and the results are aggregated to produce the final output. Note that multiple datasets with identical system and sample sizes can be processed simultaneously and the batch management system allows flexible analysis on the fly. B) Computational time versus order of interactions for a 30-variable system with 1000 samples. The THOI method successfully computes all possible HOI in less than 6 hours, whereas other libraries are unable to process interactions beyond order 11 within the same time frame. C) Log-log plot of computational time as a function of sample size for a 20-variable system. All libraries exhibit logarithmic scaling, but THOI outperforms the others in terms of computational speed.
  • Figure 2: Within-order optimization with greedy and simulated annealing algorithmsA, B) Maximum (red) and minimum (blue) $\Omega$ obtained by greedy and SA algorithms for a $100$-variable system composed of strong/weak R and S systems and an independent system, each with 20 variables. Dashed horizontal lines indicate ground truth for the weak systems, and dotted lines represent the sum of weak and strong systems (red for R, blue for S). Both algorithms successfully identify the systems, but greedy required $100$ times more repeats than SA. C, D) Subsets of variables that maximize $\Omega$ at each order of interaction for greedy and SA. Systems are color-coded (weak R: light red; strong R: dark red; weak S: light blue; strong S: dark blue; independent: gray). Both algorithms prioritize strong R, then weak R, followed by a mix of independent and S systems. E, F) Subsets of variables that minimize $\Omega$. Both algorithms prioritize strong S, then weak S, followed by a combination of independent and R systems, with a preference for the former.
  • Figure 3: A) Estimated maximum and minimum $\Omega$ via the GA for awake (green) and deep anesthesia state (purple). Inset shows the reduction of minimum $\Omega$ at lower orders of interaction. B) Average maximum (red) and minimum (blue) effect size obtained from a GA tailored to amplify the difference between the two conditions. Shaded areas denote the range from the minimum to the maximum value for each optimization procedure. C) Distribution of $\Omega$ for the whole-brain in awake (green) and deep anesthesia state (purple). Each dot is a subject and lines connect their respective value in both conditions. Despite the trend to reduce redundancy, no significant difference was found (Wilcoxon $p>0.001$) D) Distribution of the $n$-plets that maximizes (left) and minimizes (right) the effect size obtained by the GA. E) Same as D, but for the $n$-plets obtained with the SA algorithm. Order is the number of elements in $n$-plets, $p$ is the Wilcoxon p-value and $d$ is the Cohen's $d$.
  • Figure 4: A) Spearman correlation matrix of features across datasets. Colors code different types of features. 'prop. sy$n$-plets' is the proportion of synergy-dominated $n$-plets out of the total number of $n$-plets. 'order max $\Omega$' and 'order min $\Omega$' is the order where $\Omega$ was maximized and minimized, respectively, normalized by the system size. 'mean MI' and 'std MI' are the mean and standard deviation of the pairwise mutual information for each dataset. The prefix 'whole' indicates that the whole system was considered (i.e. all the system variables). B) Cumulative explained variance associated with each PC after PCA. The first four components capture approximately 95% of the variance. C) Values of the first four PCs on each feature. Colors are the same as in A. PC1 captures the overall interdependencies, by grouping together all the 'max', 'mean', 'whole' and 'MI' related features. PC2 captures the overall independence by grouping together all the 'min' related features. PC3 captures the proportion of synergy-dominated interaction by grouping together 'prop. syn-plets' and the order at which $\Omega$ is maximized and minimized. PC4 captures the behavior of $\Omega$, by grouping together its maximum, minimum, mean and whole-system values.
  • Figure 5: Sub-covariance matrices sampled with padding to allow different covariance matrix sizes in a single batch. 1) First, a mask is applied to the full covariance matrices using a masked encoding of the $n$-plets (each with a different number of masked variables) to obtain each sub-covariance matrix. At this point, the obtained covariance matrices are invalid as the masked rows and columns have zeros on the diagonal, yielding a constant distribution. 2) Then, an identity matrix is masked with the inverted $n$-plet encodings. 3) Both masked matrices are added to obtain the final covariance matrix where the rows and columns of the $n$-plet have the values from the full covariance matrix, and the remaining rows and columns have ones on the diagonal and zeros elsewhere, representing an independent standard normal component.
  • ...and 1 more figures