Table of Contents
Fetching ...

Network mutual information measures for graph similarity

Helcio Felippe, Federico Battiston, Alec Kirkley

TL;DR

This work introduces a principled information-theoretic framework for graph similarity by constructing a family of graph mutual information measures that operate at different structural scales. It presents three encodings—edge-level overlap (NMI), degree-corrected neighborhood overlap (DC-NMI), and mesoscale structure via fixed partitions (MesoNMI)—to quantify shared information between node-aligned graphs. Through synthetic perturbations and real multilayer networks (e.g., FAO trade data), the authors demonstrate that microscale measures capture fine-grained similarity while Mesoscale NMI appropriately emphasizes coarser community-like structure, enabling scale-aware comparisons and robust downstream analyses. The approach is fast, interpretable, and adaptable to weighted, directed, and higher-order network representations, with open-source code and data available for reproducibility and broad application in network analysis and anomaly detection.

Abstract

A wide range of tasks in network analysis, such as clustering network populations or identifying anomalies in temporal graph streams, require a measure of the similarity between two graphs. To provide a meaningful data summary for downstream scientific analyses, the graph similarity measures used for these tasks must be principled, interpretable, and capable of distinguishing meaningful overlapping network structure from statistical noise at different scales of interest. Here we derive a family of graph mutual information measures that satisfy these criteria and are constructed using only fundamental information theoretic principles. Our measures capture the information shared among networks according to different encodings of their structural information, with our mesoscale mutual information measure allowing for network comparison under any specified network coarse-graining. We test our measures in a range of applications on real and synthetic network data, finding that they effectively highlight intuitive aspects of network similarity across scales in a variety of systems.

Network mutual information measures for graph similarity

TL;DR

This work introduces a principled information-theoretic framework for graph similarity by constructing a family of graph mutual information measures that operate at different structural scales. It presents three encodings—edge-level overlap (NMI), degree-corrected neighborhood overlap (DC-NMI), and mesoscale structure via fixed partitions (MesoNMI)—to quantify shared information between node-aligned graphs. Through synthetic perturbations and real multilayer networks (e.g., FAO trade data), the authors demonstrate that microscale measures capture fine-grained similarity while Mesoscale NMI appropriately emphasizes coarser community-like structure, enabling scale-aware comparisons and robust downstream analyses. The approach is fast, interpretable, and adaptable to weighted, directed, and higher-order network representations, with open-source code and data available for reproducibility and broad application in network analysis and anomaly detection.

Abstract

A wide range of tasks in network analysis, such as clustering network populations or identifying anomalies in temporal graph streams, require a measure of the similarity between two graphs. To provide a meaningful data summary for downstream scientific analyses, the graph similarity measures used for these tasks must be principled, interpretable, and capable of distinguishing meaningful overlapping network structure from statistical noise at different scales of interest. Here we derive a family of graph mutual information measures that satisfy these criteria and are constructed using only fundamental information theoretic principles. Our measures capture the information shared among networks according to different encodings of their structural information, with our mesoscale mutual information measure allowing for network comparison under any specified network coarse-graining. We test our measures in a range of applications on real and synthetic network data, finding that they effectively highlight intuitive aspects of network similarity across scales in a variety of systems.
Paper Structure (15 sections, 31 equations, 11 figures, 1 table)

This paper contains 15 sections, 31 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Family of proposed network mutual information measures for graph similarity. (a) Standard normalized mutual information (NMI, Eq. \ref{['eq:NMIgraph']}) between networks $G_1$ and $G_2$, with shared node labels indicated by node positions. Due to little overlap in the edge positions, this NMI measure returns a score $\text{NMI}(G_1;G_2)\approx 0$. (b) Degree-corrected normalized mutual information (DC-NMI, Eq. \ref{['eq:DCNMI']}) between graphs $G_1$ and $G_2$. Due to little overlap in node neighborhoods, we also see a low value for $\text{DC-NMI}(G_1;G_2)$. (c) Mesoscale normalized mutual information (MesoNMI, Eq. \ref{['eq:mesoNMI2']}) between networks $G_1$ and $G_2$ with respect to the indicated node partition $\bm{b}$ into $B=2$ groups (colored, circled sets of nodes). Since the mesoscale structure of these networks is quite similar, as indicated by the edge overlap $E^{(\bm{b})}_{12}=12$, we have a high similarity value $\text{MesoNMI}^{(\bm{b})}(G_1;G_2)\approx 1$. (d) MesoNMI between the two networks but with respect to a different partition $\bm{b}$ with $B=4$ groups (colored, circled sets of nodes). Here we still see a relatively high MesoNMI value, indicating substantial shared structure at this smaller scale. For reference, the Jaccard index among the edge sets in this case is $\vert G_1\cap G_2\vert/\vert G_1\cup G_2\vert=0.315$---a much higher value than the NMI in panel (a)---indicating that the edge overlap is not much different than expected based on the network densities.
  • Figure 2: Graph similarity measures for networks under node and edge attacks. (a) Graph similarity as a function of the fraction of nodes attacked $\epsilon$ for random networks, where nodes are attacked in decreasing order of degree. Graph similarity is measured with the NMI, DC-NMI, and MesoNMI for $B\in \{1,10,100\}$ to capture multiple scales of similarity. The Jaccard index $\vert G_1\cap G_2\vert/\vert G_1\cup G_2\vert$ is included for comparison. The MesoNMI partitions $\bm{b}$ are computed with a standard stochastic block model (SBM) with fixed group sizes $B\in \{1,10,100\}$ on the initial (un-attacked) graph. Simulations are averaged over $10$ realizations of the initial graph from the Erdős-Rényi model (ER, top left panel) and Barabási-Albert model (BA, top right panel) with $N=1000$ nodes and average degree $\langle k \rangle=10$ (error bars indicate three standard errors and are vanishingly small). (b) Graph similarity as a function of the fraction of edges randomly rewired, for the same synthetic networks. Subtle differences in the decay rates of different similarity measures are reflective of intuitive properties of these measures, as discussed in Sec. \ref{['sec:results']}.
  • Figure 3: Mesoscale mutual information between stochastic block model (SBM) networks. Edge attack simulations were performed on networks generated from an SBM with two groups of $500$ nodes, average degree of $10$, and mixing level $\mu \in [0,1]$ fixing the fraction of edges running between nodes of the same group identity. (a) MesoNMI values for different edge attack fractions $\epsilon$ (indicated by curves of shades of gray) as a function of the number of groups $B$ of the node partition $\bm{b}$ used for the MesoNMI calculation, for SBM networks with no mixing preference ($\mu=0.5$, equivalent to Erdős-Rényi random graphs). The MesoNMI is more sensitive to edge-level attacks as the scale of interest for comparison gets smaller (i.e. the number of groups $B$ gets larger). (b) MesoNMI as a function of the mixing level $\mu$ of the initial graph being attacked, with the partition $\bm{b}$ used for the MesoNMI calculations being fixed as the initial graph's planted partition into $B=2$ groups. As the mixing level moves away from $\mu=0.5$, we see a stronger dependence of the MesoNMI on the attack level $\epsilon$ due to the rewiring of inter-community ($\mu=0$) or intra-community ($\mu=1$) edges to produce an equitable mixture of these two edge types in expectation.
  • Figure 4: Comparison of graph similarity values among layers of the FAO trade network. (a) Pairwise similarity matrices among layers of the FAO trade network de2015structural, each layer representing the global trade patterns among countries for a particular good. The MesoNMI was computed with respect to a partition of the country nodes in each layer according to a Global North-South dichotomy globalsouth. The network Jensen Shannon divergence (JSD) measure of de2015structural is transformed into a similarity measure using $1-\text{JSD}$ and included for comparison. All matrices indicate a similar block structure to the layer similarities, and as the network scale of interest increases (NMI to DC-NMI, to MesoNMI to JSD) we find systematically higher similarity values, with the MesoNMI having the greatest discriminative power. (b) Rank-biased overlap (RBO) webber2010similarity between the pairwise distances calculated using each pair of similarity measures. For example, the (NMI, DC-NMI) entry of this matrix is the RBO between the entires of the top two panels in (a). As the scale of interest decreases, we find greater RBO between the corresponding pairwise distance matrices. (c) Number of clusters versus the corresponding Ward linkage distance for a hierarchical clustering of the layers de2015structural. There are discrepancies in the hierarchical cluster structure inferred using the measures, with measures operating at similar scales having similar linkage patterns.
  • Figure S1: Average similarity among graphs generated from random graph ensembles. (a) Average pairwise graph similarity for $100$ samples from the Erdos-Renyi random graph model with $N=10,100$ nodes and variable connection probability $p$. Error bars indicate one standard error in the mean. (b) Repeating the experiment but sampling networks from a nonlinear preferential attachment network growth model with exponent $\alpha$, we find almost no similarity between sampled graphs for $\alpha<2$, followed by a sudden transition at $\alpha=2$ to highly fluctuating similarity values. This is due to the predominance of loopy graphs during the $\alpha < 2$ regime, which results in low values of edge overlap between graphs, while for $\alpha>2$ we have star-like networks rooted at each of the two seed nodes with equal probability, resulting in a similarity value that is either $\approx 0$ or $\approx 1$ for any given pair of networks.
  • ...and 6 more figures