Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological Perspective
Xingyi Zhang, Zixuan Weng, Sibo Wang
TL;DR
The paper addresses why PPR-based matrix-factorization node embeddings outperform random-walk methods by introducing a unified spectral framework for PPR-based embeddings and a topological inversion method (PPREI). It shows that several state-of-the-art approaches are special cases of a single framework and presents two inversion techniques (analytical and optimization) to recover graphs from embeddings. Extensive experiments on six real-world graphs demonstrate that PPR-based embeddings retain more topological information—edges, path lengths, and community structure—than random-walk-based embeddings, providing a topological explanation for their superior downstream performance. Overall, the work advances interpretability in graph representation learning by linking PPR diffusion, spectral structure, and topology recovery.
Abstract
Node embedding learns low-dimensional vectors for nodes in the graph. Recent state-of-the-art embedding approaches take Personalized PageRank (PPR) as the proximity measure and factorize the PPR matrix or its adaptation to generate embeddings. However, little previous work analyzes what information is encoded by these approaches, and how the information correlates with their superb performance in downstream tasks. In this work, we first show that state-of-the-art embedding approaches that factorize a PPR-related matrix can be unified into a closed-form framework. Then, we study whether the embeddings generated by this strategy can be inverted to better recover the graph topology information than random-walk based embeddings. To achieve this, we propose two methods for recovering graph topology via PPR-based embeddings, including the analytical method and the optimization method. Extensive experimental results demonstrate that the embeddings generated by factorizing a PPR-related matrix maintain more topological information, such as common edges and community structures, than that generated by random walks, paving a new way to systematically comprehend why PPR-based node embedding approaches outperform random walk-based alternatives in various downstream tasks. To the best of our knowledge, this is the first work that focuses on the interpretability of PPR-based node embedding approaches.
