Node Similarities under Random Projections: Limits and Pathological Cases
Tvrtko Tadić, Cassiano Becker, Jennifer Neville
TL;DR
This work analyzes how random projections affect node similarities in graph embeddings derived from $A$ or $T$. It shows a stark degree-dependence for dot-product preservation, while cosine similarity remains robust to degree heterogeneity, backed by rotation-based asymptotic and finite-sample results. The paper also derives the probability of ranking flips under projection, proving that cosine-based rankings are far more stable than dot-product-based ones, with empirical validation on a large Wikipedia graph. Practically, it recommends using normalized embeddings and cosine similarity, or alternative projections that mitigate high- or low-degree instabilities, to achieve reliable graph representations under RP. The findings advance understanding of RP reliability in graph tasks and guide practitioners toward more stable embedding strategies.
Abstract
Random Projections have been widely used to generate embeddings for various graph learning tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by random projections when these are applied over the rows of the graph matrix. Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings. We find that, depending on the degree distribution, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the normalized transition matrix is used. With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.
