Table of Contents
Fetching ...

A Survey on Recent Random Walk-based Methods for Embedding Knowledge Graphs

Elika Bozorgi, Sakher Khalil Alqaiidi, Afsaneh Shams, Hamid Reza Arabnia, Krzysztof Kochut

TL;DR

The work addresses the knowledge-graph embedding problem by surveying seven recent random-walk–based deep learning methods. It details how DeepWalk, LINE, Node2vec, PTE, Metapath2vec/++, Regpattern2vec, and Subgraph2vec generate random walks and learn embeddings, including strategies for handling homogeneous vs. heterogeneous graphs and incorporating subgraph or regex biases. Each method is described in terms of its walk generation, objective, and how it captures local and global structure or node/edge types, offering guidance on method selection for large-scale knowledge graphs. The review highlights the practical value and scalability of random-walk approaches while signaling avenues for future work, including non‑random-walk techniques and broader evaluation frameworks.

Abstract

Machine learning, deep learning, and NLP methods on knowledge graphs are present in different fields and have important roles in various domains from self-driving cars to friend recommendations on social media platforms. However, to apply these methods to knowledge graphs, the data usually needs to be in an acceptable size and format. In fact, knowledge graphs normally have high dimensions and therefore we need to transform them to a low-dimensional vector space. An embedding is a low-dimensional space into which you can translate high dimensional vectors in a way that intrinsic features of the input data are preserved. In this review, we first explain knowledge graphs and their embedding and then review some of the random walk-based embedding methods that have been developed recently.

A Survey on Recent Random Walk-based Methods for Embedding Knowledge Graphs

TL;DR

The work addresses the knowledge-graph embedding problem by surveying seven recent random-walk–based deep learning methods. It details how DeepWalk, LINE, Node2vec, PTE, Metapath2vec/++, Regpattern2vec, and Subgraph2vec generate random walks and learn embeddings, including strategies for handling homogeneous vs. heterogeneous graphs and incorporating subgraph or regex biases. Each method is described in terms of its walk generation, objective, and how it captures local and global structure or node/edge types, offering guidance on method selection for large-scale knowledge graphs. The review highlights the practical value and scalability of random-walk approaches while signaling avenues for future work, including non‑random-walk techniques and broader evaluation frameworks.

Abstract

Machine learning, deep learning, and NLP methods on knowledge graphs are present in different fields and have important roles in various domains from self-driving cars to friend recommendations on social media platforms. However, to apply these methods to knowledge graphs, the data usually needs to be in an acceptable size and format. In fact, knowledge graphs normally have high dimensions and therefore we need to transform them to a low-dimensional vector space. An embedding is a low-dimensional space into which you can translate high dimensional vectors in a way that intrinsic features of the input data are preserved. In this review, we first explain knowledge graphs and their embedding and then review some of the random walk-based embedding methods that have been developed recently.
Paper Structure (11 sections, 23 equations, 4 figures, 1 table)

This paper contains 11 sections, 23 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Skip-gram architecture.
  • Figure 2: BFS and DFS algorithms for a neighborhood where node u is the source node grover2016node2vec. Starting from node 1, BFS visits nodes: 1,2,3,4 and DFS visits nodes: 1,2,5,9.
  • Figure 3: Converting partially labeled text corpora to a heterogeneous text network. The word-word co-occurrence and word-document networks encode the unsupervised information, capturing the local context-level and document-level word co-occurrences respectively. The word-label network encodes the supervised information, capturing the class-level word co-occurrences tang2015pte.
  • Figure 4: 2D PCA projections of the 128D embeddings of 16 top CS conferences and corresponding high-profile authors dong2017metapath2vec.