Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction

Minsang Kim; Seungjun Baek

Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction

Minsang Kim, Seungjun Baek

TL;DR

This work tackles link prediction with graph neural networks by introducing Hierarchical Position embedding with Landmarks and Clustering (HPLC), which encodes nodes’ positions relative to a small, well-distributed set of landmarks organized via graph clustering. The authors provide theoretical justification for landmark-based distance approximations in random graphs, showing that a modest number of landmarks can yield near-optimal or asymptotically optimal detour distances in ER and BA models, guiding the design toward $K(N)=O(\log N)$. The method combines landmark distances with Laplacian-based membership encoding and cluster-level encoders to produce expressive, scalable positional embeddings that improve performance across diverse datasets and GNN backbones. Empirical results demonstrate state-of-the-art link prediction performance with favorable scalability, and ablations validate the contribution of each hierarchical component.

Abstract

Learning positional information of nodes in a graph is important for link prediction tasks. We propose a representation of positional information using representative nodes called landmarks. A small number of nodes with high degree centrality are selected as landmarks, which serve as reference points for the nodes' positions. We justify this selection strategy for well-known random graph models and derive closed-form bounds on the average path lengths involving landmarks. In a model for power-law graphs, we prove that landmarks provide asymptotically exact information on inter-node distances. We apply theoretical insights to practical networks and propose Hierarchical Position embedding with Landmarks and Clustering (HPLC). HPLC combines landmark selection and graph clustering, where the graph is partitioned into densely connected clusters in which nodes with the highest degree are selected as landmarks. HPLC leverages the positional information of nodes based on landmarks at various levels of hierarchy such as nodes' distances to landmarks, inter-landmark distances and hierarchical grouping of clusters. Experiments show that HPLC achieves state-of-the-art performances of link prediction on various datasets in terms of HIT@K, MRR, and AUC. The code is available at \url{https://github.com/kmswin1/HPLC}.

Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction

TL;DR

. The method combines landmark distances with Laplacian-based membership encoding and cluster-level encoders to produce expressive, scalable positional embeddings that improve performance across diverse datasets and GNN backbones. Empirical results demonstrate state-of-the-art link prediction performance with favorable scalability, and ablations validate the contribution of each hierarchical component.

Abstract

Paper Structure (40 sections, 4 theorems, 42 equations, 3 figures, 10 tables)

This paper contains 40 sections, 4 theorems, 42 equations, 3 figures, 10 tables.

Introduction
Random Graphs with Landmarks
Notation
Representation of Positions using Landmarks
Path lengths via Landmarks in random graphs
Erdős-Rényi Model
Barabási-Albert Model
Design Insights from Theory
Proposed Method
Graph Clustering and Landmark Selection
Membership Encoding with Graph Laplacian
Cluster-group Encoding
Property of HPLC as Node embedding
Complexity Analysis
Time Complexity
...and 25 more sections

Key Result

Theorem 1

Let $L_{ij}$ denote the random variable representing the minimum path length from node $i$ to $j$ among the detours via $K(N)$ landmarks. The landmarks are chosen i.i.d. according to distribution $Q$. Asymptotically in $N$, $P(L_{ij} > s)$ is given by eq:pathlen.

Figures (3)

Figure 1: Overview of HPLC. 1 Partition the graph into $K$ clusters using FluidC, select landmarks based on degrees, and compute distance vectors of nodes. 2 Construct a landmark graph to compute membership vectors based on eigenvectors of graph Laplacian. 3 Compute positional embeddings by combining membership and distance vectors and passing them through an encoder. 4 Concatenate positional embeddings and node features, and project them onto cluster-group embedding spaces. 5 Neighborhood aggregation using GNNs. $\oplus$ denotes concatenation.
Figure 2: Comparison of inter-node distances, landmark detour distances, and upper bound for $K(N)= \log N, N^{0.5}, N^{0.9}$ in ER networks.
Figure 3: Comparison of inter-node distances, landmark detour distances, and theoretical upper bounds in BA networks.

Theorems & Definitions (5)

Theorem 1
Theorem 2
Theorem 3
Definition 1
Lemma 1

Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction

TL;DR

Abstract

Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (5)