Hub-aware Random Walk Graph Embedding Methods for Classification
Aleksandar Tomčić, Miloš Savić, Miloš Radovanović
TL;DR
This work tackles node classification on large graphs by introducing two hub-aware graph embedding methods, SCWalk and Hub-Walk-Distribution, which bias random walks using label information to emphasize good hubs and de-emphasize bad hubs. Framed within a random-walk embedding paradigm and evaluated against the node2vec baseline, these methods learn embeddings via a single Word2Vec model and improve downstream classifier performance across multiple real-world networks. The results show consistent gains for SVM, RF, and NB classifiers, with performance improving as the biased sampling parameter $p$ increases, especially for SCWalk. The study demonstrates a practical, label-aware, and hub-centric approach to graph embeddings that offers both accuracy gains and computational efficiency for node classification tasks.
Abstract
In the last two decades we are witnessing a huge increase of valuable big data structured in the form of graphs or networks. To apply traditional machine learning and data analytic techniques to such data it is necessary to transform graphs into vector-based representations that preserve the most essential structural properties of graphs. For this purpose, a large number of graph embedding methods have been proposed in the literature. Most of them produce general-purpose embeddings suitable for a variety of applications such as node clustering, node classification, graph visualisation and link prediction. In this paper, we propose two novel graph embedding algorithms based on random walks that are specifically designed for the node classification problem. Random walk sampling strategies of the proposed algorithms have been designed to pay special attention to hubs -- high-degree nodes that have the most critical role for the overall connectedness in large-scale graphs. The proposed methods are experimentally evaluated by analyzing the classification performance of three classification algorithms trained on embeddings of real-world networks. The obtained results indicate that our methods considerably improve the predictive power of examined classifiers compared to currently the most popular random walk method for generating general-purpose graph embeddings (node2vec).
