Table of Contents
Fetching ...

Information theory for hypergraph similarity

Helcio Felippe, Alec Kirkley, Federico Battiston

TL;DR

The paper addresses the limitation of dyadic graph similarity for systems with higher-order interactions by introducing a principled MDL-based framework to quantify hypergraph similarity. It defines normalized mutual information measures under three encodings—bulk, align, and cross—to capture intra-order, cross-order, and mesoscale similarities across arbitrary node coarse-grainings. The authors demonstrate a coherent hierarchy: (i) intra-order similarity is robust to layer-density heterogeneity (align), (ii) cross-order similarity detects nested and cross-layer correspondences (cross), and (iii) mesoscale extensions reveal community-level similarity beyond node-level overlaps. They validate the approach on synthetic hypergraphs and apply it to empirical multiplex hypergraphs from physics, film, and software, yielding meaningful structure inlayer similarities and practical insights for higher-order network analysis. The framework offers scalable, interpretable tools for principled comparison of higher-order networks and opens avenues for broader applications including temporal and metadata-enabled hypergraphs.

Abstract

Comparing networks is essential for a number of downstream tasks, from clustering to anomaly detection. Despite higher-order interactions being critical for understanding the dynamics of complex systems, traditional approaches for network comparison are limited to pairwise interactions only. Here we construct a general information theoretic framework for hypergraph similarity, capturing meaningful correspondence among higher-order interactions while correcting for spurious correlations. Our method operationalizes any notion of structural overlap among hypergraphs as a principled normalized mutual information measure, allowing us to derive a hierarchy of increasingly granular formulations of similarity among hypergraphs within and across orders of interactions, and at multiple scales. We validate these measures through extensive experiments on synthetic hypergraphs and apply the framework to reveal meaningful patterns in a variety of empirical higher-order networks. Our work provides foundational tools for the principled comparison of higher-order networks, shedding light on the structural organization of networked systems with non-dyadic interactions.

Information theory for hypergraph similarity

TL;DR

The paper addresses the limitation of dyadic graph similarity for systems with higher-order interactions by introducing a principled MDL-based framework to quantify hypergraph similarity. It defines normalized mutual information measures under three encodings—bulk, align, and cross—to capture intra-order, cross-order, and mesoscale similarities across arbitrary node coarse-grainings. The authors demonstrate a coherent hierarchy: (i) intra-order similarity is robust to layer-density heterogeneity (align), (ii) cross-order similarity detects nested and cross-layer correspondences (cross), and (iii) mesoscale extensions reveal community-level similarity beyond node-level overlaps. They validate the approach on synthetic hypergraphs and apply it to empirical multiplex hypergraphs from physics, film, and software, yielding meaningful structure inlayer similarities and practical insights for higher-order network analysis. The framework offers scalable, interpretable tools for principled comparison of higher-order networks and opens avenues for broader applications including temporal and metadata-enabled hypergraphs.

Abstract

Comparing networks is essential for a number of downstream tasks, from clustering to anomaly detection. Despite higher-order interactions being critical for understanding the dynamics of complex systems, traditional approaches for network comparison are limited to pairwise interactions only. Here we construct a general information theoretic framework for hypergraph similarity, capturing meaningful correspondence among higher-order interactions while correcting for spurious correlations. Our method operationalizes any notion of structural overlap among hypergraphs as a principled normalized mutual information measure, allowing us to derive a hierarchy of increasingly granular formulations of similarity among hypergraphs within and across orders of interactions, and at multiple scales. We validate these measures through extensive experiments on synthetic hypergraphs and apply the framework to reveal meaningful patterns in a variety of empirical higher-order networks. Our work provides foundational tools for the principled comparison of higher-order networks, shedding light on the structural organization of networked systems with non-dyadic interactions.

Paper Structure

This paper contains 19 sections, 29 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Hierarchy of information-theoretic measures for hypergraph similarity. Hypergraphs $G_1$, $G_2$, and $G_3$ are defined on the same set of $N=8$ labeled nodes, with hypergraph layers $G_i^{(\ell)}$ indexed by $\ell\in\{2,3,4\}$ and illustrated as thick blue lines (dyads), green triangles (triplets), and orange squares (quadruplets). Heatmaps show the order-order mutual information between pairwise projections of hypergraph layers $G_i^{(\ell)}$ and $G_j^{(k)}$, for all $\ell,k\in \{2,3,4\}$. The three proposed hypergraph mutual information measures---$\text{NMI}_{\rm bulk}$, $\text{NMI}_{\rm align}$, $\text{NMI}_{\rm cross}$, which are derived using the general framework discussed in Sec. \ref{['sec:hierarchy']}---are shown for each pair of hypergraphs. These measures assess the structural similarity between a pair of hypergraphs with increasingly detailed encodings to highlight structural overlaps at and across different hyperedge orders.
  • Figure 2: Information theory measures for intra-order hypergraph similarity. (a) Random hypergraphs with homogeneous layer densities. Order-order graph NMI values for the layers' pairwise projections (left) show maximum shared structure at $\epsilon=0$, which decreases uniformly as the layers are randomized. The intra-order hypergraph similarity measures $\text{NMI}_{\rm bulk}$ and $\text{NMI}_{\rm align}$ smoothly decrease with the noise $\epsilon$, reaching zero in the regime of complete noise (right). Due to the homogeneous hyperedge densities across layers, both NMI measures give similar values. (b) Random hypergraphs with heterogeneous layer densities. Order-order similarities (left) indicate higher intra-order similarity for larger orders as noise is applied, due to the heterogeneous densities of the layers. In this case, we see that $\text{NMI}_{\rm bulk}$ inflates the mutual information contributions for high $\epsilon$, resulting in a non-negligible NMI value at $\epsilon=1$. The $\text{NMI}_{\rm align}$ measure does not have this issue, vanishing in the high noise regime.
  • Figure 3: Information theory measures for cross-order hypergraph similarity. (a) Two initial random hypergraphs share the same layers $\ell=3$, 5, and 7, which are nested inside of $\ell=2$, 4, and 6, respectively. (b) Layers 6 and 7 are perturbed, causing their respective blocks to lose intra-order similarity. The intra-order measure $\text{NMI}_{\text{align}}$ is thus reduced, while the cross-order measure $\text{NMI}_{\text{cross}}$ changes negligibly. (c) Layers 4 and 5 are further perturbed, removing another nested block. Intra-order similarity is significantly reduced yet again. (d) Finally, layers 2 and 3 are perturbed, dismantling all blocks and eliminating any similarity between layers of equal size. The intra-order score $\text{NMI}_{\rm align}$ approaches the minimum value of zero, while $\text{NMI}_{\rm cross}$ is still able to capture the structural similarity across different orders of interaction.
  • Figure 4: Mesoscale similarity for hypergraphs. (a) $\text{NMI}_{\rm cross}$ and its mesoscale variant $\text{NMI}^{(\bm{b})}_{\rm cross}$ for two small example networks on $N=8$ nodes, with the partition $\bm{b}$ dividing the nodes into $B=2$ groups indicated in yellow and pink. While the mesoscale measure is able to detect perfect similarity among the coarse-grained hypergraphs $\tilde{G}_1^{({\bm b})}$ and $\tilde{G}_2^{({\bm b})}$, the standard NMI variant detects a low level of similarity at the node-level. (b) Mesoscale NMI for pairs of random clustered hypergraphs generated with an average fraction $p$ of nodes belonging to the same group. As we increase the level of noise $\rho_{\bm{b}}$ between the two hypergraphs' underlying node partitions, the mesoscale NMI smoothly decreases, with stronger levels of community structure $p$ resulting in a more severe decline in the NMI (left). When both hypergraphs are generated from the same underlying node partition ($\rho_{\bm b}=0$) with different community strengths $p_1,p_2$, we see that greater levels of community structure result in greater levels of shared information among the hypergraphs, with $p_1=p_2=1$ giving maximum similarity.
  • Figure 5: Hypergraph similarity for real-world systems. NMI matrices among all pairs of hypergraphs within real-world systems arising across various disciplines (top row), each accompanied by its corresponding minimum spanning tree using $1-\text{NMI}$ as an edge weight. (a) Multiplex hypergraph of co-authorship among physics authors in different physics fields. (b) Multiplex hypergraph of co-appearances among actors in different film genres. (c) Multiplex hypergraph of repository co-editing among software development teams. For each system, the similarity among hypergraphs corresponding to qualitatively similar subjects (e.g. nuclear and elementary particle physics) tend to be higher.
  • ...and 7 more figures