A Survey on Recent Random Walk-based Methods for Embedding Knowledge Graphs
Elika Bozorgi, Sakher Khalil Alqaiidi, Afsaneh Shams, Hamid Reza Arabnia, Krzysztof Kochut
TL;DR
The work addresses the knowledge-graph embedding problem by surveying seven recent random-walk–based deep learning methods. It details how DeepWalk, LINE, Node2vec, PTE, Metapath2vec/++, Regpattern2vec, and Subgraph2vec generate random walks and learn embeddings, including strategies for handling homogeneous vs. heterogeneous graphs and incorporating subgraph or regex biases. Each method is described in terms of its walk generation, objective, and how it captures local and global structure or node/edge types, offering guidance on method selection for large-scale knowledge graphs. The review highlights the practical value and scalability of random-walk approaches while signaling avenues for future work, including non‑random-walk techniques and broader evaluation frameworks.
Abstract
Machine learning, deep learning, and NLP methods on knowledge graphs are present in different fields and have important roles in various domains from self-driving cars to friend recommendations on social media platforms. However, to apply these methods to knowledge graphs, the data usually needs to be in an acceptable size and format. In fact, knowledge graphs normally have high dimensions and therefore we need to transform them to a low-dimensional vector space. An embedding is a low-dimensional space into which you can translate high dimensional vectors in a way that intrinsic features of the input data are preserved. In this review, we first explain knowledge graphs and their embedding and then review some of the random walk-based embedding methods that have been developed recently.
