The Effects of Randomness on the Stability of Node Embeddings
Tobias Schumacher, Hinrikus Wolf, Martin Ritzert, Florian Lemmerich, Jan Bachmann, Florian Frantzen, Max Klabunde, Martin Grohe, Markus Strohmaier
TL;DR
This paper investigates how randomness inherent in state-of-the-art node-embedding algorithms affects stability in both embedding geometry and downstream classification. It evaluates five algorithms (HOPE, LINE, node2vec, SDNE, GraphSAGE) on synthetic and real graphs using three geometric measures (aligned cosine similarity, k-NN Jaccard similarity, second-order cosine similarity) and node-classification performance, revealing substantial geometric instability for most methods except HOPE, while downstream classification accuracy remains largely robust. The study emphasizes that, despite stable overall performance, individual node predictions can differ across embeddings, underscoring reproducibility concerns in embedding-based workflows. These findings motivate the design of stability-aware embeddings and repeated evaluations to ensure reliable deployment, especially in high-stakes or privacy-sensitive applications.
Abstract
We systematically evaluate the (in-)stability of state-of-the-art node embedding algorithms due to randomness, i.e., the random variation of their outcomes given identical algorithms and graphs. We apply five node embeddings algorithms---HOPE, LINE, node2vec, SDNE, and GraphSAGE---to synthetic and empirical graphs and assess their stability under randomness with respect to (i) the geometry of embedding spaces as well as (ii) their performance in downstream tasks. We find significant instabilities in the geometry of embedding spaces independent of the centrality of a node. In the evaluation of downstream tasks, we find that the accuracy of node classification seems to be unaffected by random seeding while the actual classification of nodes can vary significantly. This suggests that instability effects need to be taken into account when working with node embeddings. Our work is relevant for researchers and engineers interested in the effectiveness, reliability, and reproducibility of node embedding approaches.
