Table of Contents
Fetching ...

Broad Spectrum Structure Discovery in Large-Scale Higher-Order Networks

John Hood, Caterina De Bacco, Aaron Schein

TL;DR

This work introduces Omni-Hype-SMT, a probabilistic framework for discovering broad mesoscale structure in large-scale hypergraphs by clustering nodes into latent classes and these classes into communities. Using a low-rank, symmetric class-affinity tensor Λ^{(d)} factorized across orders with a shared node-class membership Θ and a class-community matrix W, the model flexibly represents assortative and disassortative patterns and is provably identifiable under practical constraints. The approach yields interpretable structures, improves higher-order link prediction across diverse datasets, and enables fast synthetic hypergraph generation with tunable mesoscale properties. Empirically, it uncovers meaningful drug-class interactions, core-periphery behavior in politics, and cross-domain patterns that extend beyond traditional assortative models, highlighting the importance of modeling omniassortativity in higher-order networks.

Abstract

Complex systems are often driven by higher-order interactions among multiple units, naturally represented as hypergraphs. Understanding dependency structures within these hypergraphs is crucial for understanding and predicting the behavior of complex systems but is made challenging by their combinatorial complexity and computational demands. In this paper, we introduce a class of probabilistic models that efficiently represents and discovers a broad spectrum of mesoscale structure in large-scale hypergraphs. The key insight enabling this approach is to treat classes of similar units as themselves nodes in a latent hypergraph. By modeling observed node interactions through latent interactions among classes using low-rank representations, our approach tractably captures rich structural patterns while ensuring model identifiability. This allows for direct interpretation of distinct node- and class-level structures. Empirically, our model improves link prediction over state-of-the-art methods and discovers interpretable structures in diverse real-world systems, including pharmacological and social networks, advancing the ability to incorporate large-scale higher-order data into the scientific process.

Broad Spectrum Structure Discovery in Large-Scale Higher-Order Networks

TL;DR

This work introduces Omni-Hype-SMT, a probabilistic framework for discovering broad mesoscale structure in large-scale hypergraphs by clustering nodes into latent classes and these classes into communities. Using a low-rank, symmetric class-affinity tensor Λ^{(d)} factorized across orders with a shared node-class membership Θ and a class-community matrix W, the model flexibly represents assortative and disassortative patterns and is provably identifiable under practical constraints. The approach yields interpretable structures, improves higher-order link prediction across diverse datasets, and enables fast synthetic hypergraph generation with tunable mesoscale properties. Empirically, it uncovers meaningful drug-class interactions, core-periphery behavior in politics, and cross-domain patterns that extend beyond traditional assortative models, highlighting the importance of modeling omniassortativity in higher-order networks.

Abstract

Complex systems are often driven by higher-order interactions among multiple units, naturally represented as hypergraphs. Understanding dependency structures within these hypergraphs is crucial for understanding and predicting the behavior of complex systems but is made challenging by their combinatorial complexity and computational demands. In this paper, we introduce a class of probabilistic models that efficiently represents and discovers a broad spectrum of mesoscale structure in large-scale hypergraphs. The key insight enabling this approach is to treat classes of similar units as themselves nodes in a latent hypergraph. By modeling observed node interactions through latent interactions among classes using low-rank representations, our approach tractably captures rich structural patterns while ensuring model identifiability. This allows for direct interpretation of distinct node- and class-level structures. Empirically, our model improves link prediction over state-of-the-art methods and discovers interpretable structures in diverse real-world systems, including pharmacological and social networks, advancing the ability to incorporate large-scale higher-order data into the scientific process.

Paper Structure

This paper contains 14 sections, 8 theorems, 107 equations, 5 figures, 2 tables, 2 algorithms.

Key Result

Lemma A.1

Each affinity tensor $\mathrm{\Lambda}^{\mathsmaller{(d)}}$ defined element-wise in eq:core-param1 is symmetric.

Figures (5)

  • Figure 1: Omni-Hype-SMT models a range of mesoscale structures by assigning nodes to classes and classes to communities. A schematic illustration detailing how Omni-Hype-SMT models a diverse range of mesoscale structure. a) Hypergraph data encodes higher-order interactions among entities, such as bill co-sponsorship (left). Each hyperedge captures a multi-way relationship among nodes. These interactions are represented as a multi-order adjacency tensor $\mathcal{A}^{\mathsmaller{(:)}}$ (right), where $A^{\mathsmaller{(d)}}$ denotes the order-$d$ adjacency tensor capturing $d$-way interactions. Such a formulation enables the modeling of complex, non-pairwise relational structures. b) Two parameters $\mathrm{\Theta}$ and $\mathrm{W}$ govern interactions between nodes (e.g., Senators). The node-class membership matrix $\mathrm{\Theta}$ soft clusters similar nodes to classes (e.g., political parties), while the class-community membership matrix $\mathrm{W}$ models interactions between classes through communities (e.g., policy issues). Together, the matrix $\mathrm{\Theta} \mathrm{W}$ captures assortative and disassortative activity between nodes through interactions within and between classes. c) The multi-tensor $\mathlarger{\upmu}^{\mathsmaller{(:)}}$ (left) models the observed higher-order adjacency tensors $\mathcal{A}^{\mathsmaller{(:)}}$ via symmetric low-rank tensor decomposition using a class affinity tensor $\mathrm{\Lambda}^{\mathsmaller{(d)}}$ and a node-class membership matrix $\mathrm{\Theta}$, shared over orders $d$. The class affinity multi-tensor $\Uplambda^{\mathsmaller{(:)}}$ (right) is further decomposed into community-order rates $\gamma^{\mathsmaller{(d)}}_k$ and the symmetric outer products of the columns of class-community membership matrix $\mathrm{W}$, enabling interpretable modeling of multi-way class interactions across different orders.
  • Figure 2: Omni-Hype-SMT recovers core-periphery structure among US Supreme Court justices and disassortative structure in hospital proximity data. In each setting, the projected adjacency matrix of the hypergraph data is visualized next to the inferred node-class membership and class affinity matrices. The omniassortative model's class affinity matrix has non-zero intensity on its off-diagonal, representing possible interactions between classes, while the strictly assortative matrices are zero on the off-diagonal. a) In the hospital setting, the node-class membership matrix cleanly separates patients from staff for the omniassortative model but not the strictly assortative one; we normalize each node's membership vector to emphasize this difference. For the omniassortative model, the class affinity matrix is strongly off-diagonal, suggesting that interactions among patients and staff are highly disassortative. b) In the Supreme Court Justice setting, we color-code the justices according to the party of the nominating President, and see that both models cleanly separate three blocks of justices, known loosely as the liberals, conservatives, and transitional justices. The omniassortative model's node-class membership matrix is sharper, with less mixed-membership, and it explains the crossover in voting patterns across blocks by inferring core-periphery structure, where class affinity matrix is strongly diagonal but has a non-negligible off-diagonal.
  • Figure 3: Omni-Hype-SMT learns a latent hypergraph between identified drug classes in drug-drug interaction data. a) We highlight six classes of drugs $c \in \left\{1,5,6,9,10,13\right\}$ (maroon) and four communities, each defined by a convex combination of classes, $k \in \left\{19,25,34,44\right\}$ (bronze). The columns of the class-community matrix $\mathrm{W}$ (entries correspond to bronze links) define the class weights corresponding to each community. Here, communities may be interpreted as mixtures of classes. The width of the edge between the $c^{\textrm{th}}$ class and $k^{\textrm{th}}$ community is proportional to $\mathrm{w}_{ck}$, representing the weight of class $c$ in community $k$. We show the top drugs occurring in each class, as measured by the node-class membership matrix $\mathrm{\Theta}$, and top drugs in each community, where the node-community loadings are given by the matrix multiplication $\mathrm{\Theta} \mathrm{W}$ (drugs with small nonzero values are grayed). b) $\mathrm{W},$ the class-community matrix. Each element $\mathrm{w}_{ck}$ is the weight of class $c$ in community $k$ and each column sums to 1. c) Conditional on the classes and communities shown in (a), for a fixed $d$, shown are the normalized community-order rates of each class and community given by $\gamma^{\mathsmaller{(d)}}_k$.
  • Figure 4: Relaxing strict assortativity improves interpretability and link prediction. a) For each dataset, we show relative gain in heldout log-likelihood over the strictly assortative baseline. Positive values indicate better link prediction for Omni-Hype-SMT, with error bars denoting variability over five train-test splits. b) Median entropy of $\boldsymbol{\theta}_i$ across nodes $i$, with errors denoting its interquartile range across five random initializations. Lower values denote more interpretable class structure with less uniform mixed-membership. c) Inferred disassortativity levels vary by hyperedge order and dataset.
  • Figure 5: Generating synthetic data with Omni-Hype-SMT. We use Omni-Hype-SMT to generate data using parameters learned from fitting to the DAWN dataset. We use several metrics to compare the synthetic multi-tensor $\hat{\mathcal{A}}^{\mathsmaller{(:)}}$ to the true observed data $\mathcal{A}^{\mathsmaller{(:)}}$. a) Projected adjacency matrices of the true (left) and synthetic datasets (right). Hyperedge counts are shown in log scale. b) Number of inclusion occurrences for each hyperedge order $d$ in randomly sampled subsets; this is the number of nonzero hyperedges of size $d$ which appear as a subset of a hyperedge of size $d+1$. c) Empirical node degree distribution. d) Empirical hyperedge order distribution.

Theorems & Definitions (20)

  • proof
  • proof
  • Lemma A.1
  • Lemma A.2
  • Lemma A.3: Uniqueness of CP
  • Theorem A.4: Uniqueness of the strictly assortative model
  • Theorem A.5: Uniqueness
  • Definition A.6: Generic identifiability
  • Corollary A.7: Identifiability
  • Lemma A.8
  • ...and 10 more