Table of Contents
Fetching ...

Landmark-Based Node Representations for Shortest Path Distance Approximations in Random Graphs

My Le, Luana Ruiz, Souvik Dhara

TL;DR

This work studies landmark-based node embeddings aimed at preserving shortest-path distances, inspired by Bourgain's metric embeddings. It shows that on Erdos-Renyi random graphs, the embedding dimension required for low-distortion distance approximations can be significantly smaller than worst-case bounds, with concrete rates depending on tunable parameters. A GNN-augmented variant is proposed to learn landmark distances, reducing explicit path computations and enabling transferability to larger graphs and real networks, with empirical evidence that GNN-based bounds can outperform exact landmark methods. Together, the results deliver both average-case theoretical insights and practical scalable methods for distance-aware graph representations on large-scale networks.

Abstract

Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as landmark-based algorithms, these embeddings approximate pairwise distances by computing shortest paths from a small subset of reference nodes called landmarks. Our main theoretical contribution shows that random graphs, such as Erdos-Renyi random graphs, require lower dimensions in landmark-based embeddings compared to worst-case graphs. Empirically, we demonstrate that the GNN-based approximations for the distances to landmarks generalize well to larger real-world networks, offering a scalable and transferable alternative for graph representation learning.

Landmark-Based Node Representations for Shortest Path Distance Approximations in Random Graphs

TL;DR

This work studies landmark-based node embeddings aimed at preserving shortest-path distances, inspired by Bourgain's metric embeddings. It shows that on Erdos-Renyi random graphs, the embedding dimension required for low-distortion distance approximations can be significantly smaller than worst-case bounds, with concrete rates depending on tunable parameters. A GNN-augmented variant is proposed to learn landmark distances, reducing explicit path computations and enabling transferability to larger graphs and real networks, with empirical evidence that GNN-based bounds can outperform exact landmark methods. Together, the results deliver both average-case theoretical insights and practical scalable methods for distance-aware graph representations on large-scale networks.

Abstract

Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as landmark-based algorithms, these embeddings approximate pairwise distances by computing shortest paths from a small subset of reference nodes called landmarks. Our main theoretical contribution shows that random graphs, such as Erdos-Renyi random graphs, require lower dimensions in landmark-based embeddings compared to worst-case graphs. Empirically, we demonstrate that the GNN-based approximations for the distances to landmarks generalize well to larger real-world networks, offering a scalable and transferable alternative for graph representation learning.

Paper Structure

This paper contains 22 sections, 7 theorems, 46 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

Let $G$ be a graph with $n \geq 3$ nodes and $u_1,u_2$ be two nodes in $G$. Let $c > 1$. There exist node embeddings ${\mathbf x}^*_{u_1},{\mathbf x}^*_{u_2} \in {\mathbb R}^{D}$ with $D = \Omega (n^{1/c}\log{n})$ for which $\underline{d}(u_1,u_2)$ as in Algorithm alg satisfies

Figures (5)

  • Figure 1: Error rates of BFS-based and GNN-based lower bounds on (a) test ER graphs generated from the same $\mathrm{ER}_n(\lambda/n)$ as the training graphs, (b) test ER graphs generated by $\text{ER}_{n'}(\lambda/n')$ with larger graph size $n'$, (c) real-world networks with 3,892 to 28,281 nodes, (d) Brightkite social network with 56,739 nodes, and (e) ER-AVGDEG10-100K-L2 labeled network with 99,997 nodes. (f) Duration of generating all landmark distances by NetworkX's highly optimized BFS compared with our widest and deepest GNNs---GCN, GraphSage, GAT, and GIN models were examined and are represented by solid lines of the same color for the same number of local step $R$. See Appendices \ref{['app:exp_details']} and \ref{['app:more_exps']} for further details and discussions on the experiments and benchmark networks.
  • Figure 2: End-to-end shortest path distance predictions from $\lfloor \sqrt{n}\rfloor\text{-64-32-16-}\lfloor\sqrt{n}\rfloor$ GNNs trained on graphs generated by $\mathrm{ER}_n(\lambda/n)$. The evaluation data consists of graphs from the same model.
  • Figure 3: Error rates of BFS-based and GNN-based lower bounds on graphs generated by $\mathrm{ER}_n(\lambda/n)$, with the GNNs trained on graphs from the same model.
  • Figure 4: Error rates of BFS-based and GNN-based lower bounds on (a,d) test ER graphs generated by $\text{ER}_{n'}(\lambda/n')$, (b,e) Arxiv COND-MAT collaboration network with 21,364 nodes, and (c,f) GEMSEC company network with 14,113 nodes, with the GNNs trained on graphs from $\mathrm{ER}_n(\lambda/n)$.
  • Figure 5: Additional transferability results on real networks, with the GNNs trained on graphs from $\mathrm{ER}_n(\lambda/n)$. Legend is the same as in Figure \ref{['fig:exp3app']}.

Theorems & Definitions (10)

  • Theorem 3.1: Lower Bound Distortion Adapted From Bourgain85 [Bourgain85] and Mat96 [Mat96]
  • Theorem 3.2: Upper Bound Distortion Adapted From Sarma2010ASD [Sarma2010ASD]
  • Theorem 4.1: Lower Bound Distortion on Random Graphs
  • Theorem 4.2: Upper Bound Distortion on Random Graphs
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • proof
  • Proposition 4.5
  • proof