Hub-aware Random Walk Graph Embedding Methods for Classification

Aleksandar Tomčić; Miloš Savić; Miloš Radovanović

Hub-aware Random Walk Graph Embedding Methods for Classification

Aleksandar Tomčić, Miloš Savić, Miloš Radovanović

TL;DR

This work tackles node classification on large graphs by introducing two hub-aware graph embedding methods, SCWalk and Hub-Walk-Distribution, which bias random walks using label information to emphasize good hubs and de-emphasize bad hubs. Framed within a random-walk embedding paradigm and evaluated against the node2vec baseline, these methods learn embeddings via a single Word2Vec model and improve downstream classifier performance across multiple real-world networks. The results show consistent gains for SVM, RF, and NB classifiers, with performance improving as the biased sampling parameter $p$ increases, especially for SCWalk. The study demonstrates a practical, label-aware, and hub-centric approach to graph embeddings that offers both accuracy gains and computational efficiency for node classification tasks.

Abstract

In the last two decades we are witnessing a huge increase of valuable big data structured in the form of graphs or networks. To apply traditional machine learning and data analytic techniques to such data it is necessary to transform graphs into vector-based representations that preserve the most essential structural properties of graphs. For this purpose, a large number of graph embedding methods have been proposed in the literature. Most of them produce general-purpose embeddings suitable for a variety of applications such as node clustering, node classification, graph visualisation and link prediction. In this paper, we propose two novel graph embedding algorithms based on random walks that are specifically designed for the node classification problem. Random walk sampling strategies of the proposed algorithms have been designed to pay special attention to hubs -- high-degree nodes that have the most critical role for the overall connectedness in large-scale graphs. The proposed methods are experimentally evaluated by analyzing the classification performance of three classification algorithms trained on embeddings of real-world networks. The obtained results indicate that our methods considerably improve the predictive power of examined classifiers compared to currently the most popular random walk method for generating general-purpose graph embeddings (node2vec).

Hub-aware Random Walk Graph Embedding Methods for Classification

TL;DR

increases, especially for SCWalk. The study demonstrates a practical, label-aware, and hub-centric approach to graph embeddings that offers both accuracy gains and computational efficiency for node classification tasks.

Abstract

Paper Structure (13 sections, 5 equations, 3 figures, 12 tables, 3 algorithms)

This paper contains 13 sections, 5 equations, 3 figures, 12 tables, 3 algorithms.

Introduction
Related Work
Collective Classification
Classification based on Graph Embeddings
Classification based on Graph Neural Networks
Motivation and Contributions
Hub-based Random Walk Graph Embedding Methods
Experiments and Results
SVM Evaluation
Random Forest Evaluation
Naïve Bayes Evaluation
Experimental Analysis of Hyper-parameters
Conclusions and Future Work

Figures (3)

Figure 1: Accuracy of SVM classification for all datasets and all three graph embedding algorithms.
Figure 2: Classification accuracy of RF with 100 estimators for each dataset and three graph embedding algorithms.
Figure 3: Classification accuracy of NB for each dataset and three graph embedding algorithms.

Hub-aware Random Walk Graph Embedding Methods for Classification

TL;DR

Abstract

Hub-aware Random Walk Graph Embedding Methods for Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (3)