Hierarchical Position Embedding of Graphs with Landmarks and Clustering for Link Prediction
Minsang Kim, Seungjun Baek
TL;DR
This work tackles link prediction with graph neural networks by introducing Hierarchical Position embedding with Landmarks and Clustering (HPLC), which encodes nodes’ positions relative to a small, well-distributed set of landmarks organized via graph clustering. The authors provide theoretical justification for landmark-based distance approximations in random graphs, showing that a modest number of landmarks can yield near-optimal or asymptotically optimal detour distances in ER and BA models, guiding the design toward $K(N)=O(\log N)$. The method combines landmark distances with Laplacian-based membership encoding and cluster-level encoders to produce expressive, scalable positional embeddings that improve performance across diverse datasets and GNN backbones. Empirical results demonstrate state-of-the-art link prediction performance with favorable scalability, and ablations validate the contribution of each hierarchical component.
Abstract
Learning positional information of nodes in a graph is important for link prediction tasks. We propose a representation of positional information using representative nodes called landmarks. A small number of nodes with high degree centrality are selected as landmarks, which serve as reference points for the nodes' positions. We justify this selection strategy for well-known random graph models and derive closed-form bounds on the average path lengths involving landmarks. In a model for power-law graphs, we prove that landmarks provide asymptotically exact information on inter-node distances. We apply theoretical insights to practical networks and propose Hierarchical Position embedding with Landmarks and Clustering (HPLC). HPLC combines landmark selection and graph clustering, where the graph is partitioned into densely connected clusters in which nodes with the highest degree are selected as landmarks. HPLC leverages the positional information of nodes based on landmarks at various levels of hierarchy such as nodes' distances to landmarks, inter-landmark distances and hierarchical grouping of clusters. Experiments show that HPLC achieves state-of-the-art performances of link prediction on various datasets in terms of HIT@K, MRR, and AUC. The code is available at \url{https://github.com/kmswin1/HPLC}.
