Table of Contents
Fetching ...

Multi-Scale Node Embeddings for Graph Modeling and Generation

Riccardo Milocco, Fabian Jansen, Diego Garlaschelli

TL;DR

The paper tackles the challenge of learning node embeddings that remain faithful across multiple graph resolutions. It introduces the Multi-Scale Model (MSM), a vector-based, scale-invariant framework in which block embeddings equal the sum of their constituent node embeddings, enabling consistent graph modeling and generation under arbitrary coarse-graining. Through applications to the ING Input-Output Network and the World Trade Web, MSM demonstrates superior scale-consistency, accurate replication of clustering and triangle structure, and robust reconstruction performance across scales compared to single-scale LPCA. This multiscale approach provides a principled mechanism to study complex networks at varying resolutions and supports faithful generation of coarse-grained networks from fine-grained embeddings.

Abstract

Lying at the interface between Network Science and Machine Learning, node embedding algorithms take a graph as input and encode its structure onto output vectors that represent nodes in an abstract geometric space, enabling various vector-based downstream tasks such as network modelling, data compression, link prediction, and community detection. Two apparently unrelated limitations affect these algorithms. On one hand, it is not clear what the basic operation defining vector spaces, i.e. the vector sum, corresponds to in terms of the original nodes in the network. On the other hand, while the same input network can be represented at multiple levels of resolution by coarse-graining the constituent nodes into arbitrary block-nodes, the relationship between node embeddings obtained at different hierarchical levels is not understood. Here, building on recent results in network renormalization theory, we address these two limitations at once and define a multiscale node embedding method that, upon arbitrary coarse-grainings, ensures statistical consistency of the embedding vector of a block-node with the sum of the embedding vectors of its constituent nodes. We illustrate the power of this approach on two economic networks that can be naturally represented at multiple resolution levels: namely, the international trade between (sets of) countries and the input-output flows among (sets of) industries in the Netherlands. We confirm the statistical consistency between networks retrieved from coarse-grained node vectors and networks retrieved from sums of fine-grained node vectors, a result that cannot be achieved by alternative methods. Several key network properties, including a large number of triangles, are successfully replicated already from embeddings of very low dimensionality, allowing for the generation of faithful replicas of the original networks at arbitrary resolution levels.

Multi-Scale Node Embeddings for Graph Modeling and Generation

TL;DR

The paper tackles the challenge of learning node embeddings that remain faithful across multiple graph resolutions. It introduces the Multi-Scale Model (MSM), a vector-based, scale-invariant framework in which block embeddings equal the sum of their constituent node embeddings, enabling consistent graph modeling and generation under arbitrary coarse-graining. Through applications to the ING Input-Output Network and the World Trade Web, MSM demonstrates superior scale-consistency, accurate replication of clustering and triangle structure, and robust reconstruction performance across scales compared to single-scale LPCA. This multiscale approach provides a principled mechanism to study complex networks at varying resolutions and supports faithful generation of coarse-grained networks from fine-grained embeddings.

Abstract

Lying at the interface between Network Science and Machine Learning, node embedding algorithms take a graph as input and encode its structure onto output vectors that represent nodes in an abstract geometric space, enabling various vector-based downstream tasks such as network modelling, data compression, link prediction, and community detection. Two apparently unrelated limitations affect these algorithms. On one hand, it is not clear what the basic operation defining vector spaces, i.e. the vector sum, corresponds to in terms of the original nodes in the network. On the other hand, while the same input network can be represented at multiple levels of resolution by coarse-graining the constituent nodes into arbitrary block-nodes, the relationship between node embeddings obtained at different hierarchical levels is not understood. Here, building on recent results in network renormalization theory, we address these two limitations at once and define a multiscale node embedding method that, upon arbitrary coarse-grainings, ensures statistical consistency of the embedding vector of a block-node with the sum of the embedding vectors of its constituent nodes. We illustrate the power of this approach on two economic networks that can be naturally represented at multiple resolution levels: namely, the international trade between (sets of) countries and the input-output flows among (sets of) industries in the Netherlands. We confirm the statistical consistency between networks retrieved from coarse-grained node vectors and networks retrieved from sums of fine-grained node vectors, a result that cannot be achieved by alternative methods. Several key network properties, including a large number of triangles, are successfully replicated already from embeddings of very low dimensionality, allowing for the generation of faithful replicas of the original networks at arbitrary resolution levels.

Paper Structure

This paper contains 47 sections, 84 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: The ground truth appears at multiple scales, depending on resolution of the dataset. Nevertheless, The macro-scale representation (shown in the plots on the right) can be uniquely obtained by coarse-graining the network at the microscopic scale. But how node embeddings are connected across different scales? On the left, one finds the learning procedure of the microscopic node embeddings, whereas on the right, the macroscopic counterparts. In single-scale models, the micro-vectors cannot be directly used to calculate the macro-vectors (indicated by the red question marks). In contrast, the multi-scale model overcomes this limitation, as the macro-vectors are the sum of micro-embeddings (see the parallelogram law on the right).
  • Figure 2: This figure illustrates the numerical evaluation of the two sides of \ref{['eq:MSM_micro_psumVSpcg']} at level $\ell = 2$ for the $LPCA$ and $MSM$ models. More precisely, the left-hand side (LHS) is represented on the y-axis while the right-hand side (RHS) on the x-axis.
  • Figure 3: Cross comparison of LPCA-(8,8) and MSM-16 in predicting the clustering coefficient (CC) \ref{['SI:sec:NetworkMeasurements']}. The upper panel reports the expected clustering coefficient at level 0, while the lower panel depicts its corresponding values at level 2. The first column refers to the LPCA-(8,8), whereas the second to the MSM-16.
  • Figure 4: The plots display the predicted network measurements for the ION dataset at level 2 according to LPCA-(8,8) (upper plots) and the MSM-16 (lower plot). In each of the upper panels, the x-axis represents the observed measurements, whereas the y-axis shows the expected ones. From left to right, the plots illustrates the degree (DEG), the average-nearest-neighbor degree (ANND), and the clustering coefficient (CC). The scatter plot comparing $\mathbf{P}_{sum}^{(2)}$ with $\hat{\mathbf{P}}^{(2)}$ is reported in the inset. The lower panels show the behavior of the network measurements as the degrees increase. The observed values are colored in blue, while the one calculated using the fitted $\hat{\mathbf{P}}^{(2)}$ model in red or the summed $\mathbf{P}_{sum}^{(2)}$ model in orange. Additionally, the z-score of the predicted number of links is indicated in the legend of the upper-left plot.
  • Figure 5: The graphs show the Confusion matrices, ROC and PR curves for the ION dataset at level 2 obtained by means of the LPCA-(8,8) (upper plots) and the MSM-16 (lower plot). On the left most side, one may display the two confusion matrices for the fitted $\hat{\mathbf{P}}^{(2)}$ (upper) and the summed $\mathbf{P}_{sum}^{(2)}$. The middle plot reports the Receiver-Operator Curve, while the right-most plot depicts the Precision Recall curve. As in \ref{['fig:ING_NetMeas_DispInt']}, the two curves are associated with the fitted (red) model and the summed (orange) one.
  • ...and 9 more figures