Table of Contents
Fetching ...

Node Similarities under Random Projections: Limits and Pathological Cases

Tvrtko Tadić, Cassiano Becker, Jennifer Neville

TL;DR

This work analyzes how random projections affect node similarities in graph embeddings derived from $A$ or $T$. It shows a stark degree-dependence for dot-product preservation, while cosine similarity remains robust to degree heterogeneity, backed by rotation-based asymptotic and finite-sample results. The paper also derives the probability of ranking flips under projection, proving that cosine-based rankings are far more stable than dot-product-based ones, with empirical validation on a large Wikipedia graph. Practically, it recommends using normalized embeddings and cosine similarity, or alternative projections that mitigate high- or low-degree instabilities, to achieve reliable graph representations under RP. The findings advance understanding of RP reliability in graph tasks and guide practitioners toward more stable embedding strategies.

Abstract

Random Projections have been widely used to generate embeddings for various graph learning tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by random projections when these are applied over the rows of the graph matrix. Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings. We find that, depending on the degree distribution, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the normalized transition matrix is used. With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.

Node Similarities under Random Projections: Limits and Pathological Cases

TL;DR

This work analyzes how random projections affect node similarities in graph embeddings derived from or . It shows a stark degree-dependence for dot-product preservation, while cosine similarity remains robust to degree heterogeneity, backed by rotation-based asymptotic and finite-sample results. The paper also derives the probability of ranking flips under projection, proving that cosine-based rankings are far more stable than dot-product-based ones, with empirical validation on a large Wikipedia graph. Practically, it recommends using normalized embeddings and cosine similarity, or alternative projections that mitigate high- or low-degree instabilities, to achieve reliable graph representations under RP. The findings advance understanding of RP reliability in graph tasks and guide practitioners toward more stable embedding strategies.

Abstract

Random Projections have been widely used to generate embeddings for various graph learning tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by random projections when these are applied over the rows of the graph matrix. Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings. We find that, depending on the degree distribution, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the normalized transition matrix is used. With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.
Paper Structure (33 sections, 60 theorems, 182 equations, 5 figures, 3 tables)

This paper contains 33 sections, 60 theorems, 182 equations, 5 figures, 3 tables.

Key Result

Lemma 2.1

For all $u,v\in V$ we have

Figures (5)

  • Figure 1: Distribution of $\eta_i = \mathrm{NDCG}(r_i^K, \hat{r}_i^K)$, $K=10$, versus node degree $d_i$. On the left, we consider RP Dot Product when $P=T$, denoted by $\eta_i^T$. We compare it with RP Cosine Similarity, denoted by $\eta_i^C$. Both scores are computed using the same random projection matrix, with dimension $q=256$. The dotted line marks the lowest value observed for $\eta_i^C$, of approximately 0.75. It can be seen that $\eta_i^T$ often takes low values for higher degrees (i.e., region below dotted line), especially when $\log_2(d_i) \geq 4$. On the right, we display the equivalent plot for RP Dot Product when $P=A$, denoted by $\eta_i^A$. The NDCG scores often take low values for lower degrees, especially when $\log_2(d_i) \leq 6$.
  • Figure 2: Graphs of functions $\rho \mapsto \mathbb{P}\left(\mathcal{T}_q>\frac{|\rho|\sqrt{q}}{\sqrt{1-\rho^2}}\right)$ and $\rho \mapsto \log\mathbb{P}\left(\mathcal{T}_q>\frac{|\rho|\sqrt{q}}{\sqrt{1-\rho^2}}\right)$ for $q=100$ on the interval $[-0.5,0.5]$.
  • Figure 3: Simulated dot product $(R \tilde{x}, R \tilde{y})$ for $x=(1,0,1,0, \ldots)$ and $y=(0,1,0,1, \ldots)$ for $n=2000$ and $q=100$.
  • Figure 4: Simulated random projection for $n=q=100$, where $(x,y)/\|x\|/\|y\|=\rho=0.154$
  • Figure 5: Simulated random projection for $k=10,000,000$, $\delta =0.05$ the upper (red) curve represents the graph of $\varepsilon \mapsto \frac{2\ln \left[ \frac{2k(k-1)\left(1+\frac{\varepsilon^2}{4}\right)}{\delta}\right]}{\ln \left[ 1+\frac{\varepsilon^2}{2(1+\varepsilon\sqrt{2})}\right]}$ (minimum value of $q$ for cosine similarity) and the lower (blue) curve $\varepsilon \mapsto \frac{4}{\varepsilon^2}\ln \left[\frac{ k^2}{\delta}\right]$ (minimum value of $q$ for Johnson-Lindenstrauss Lemma). We can see that the two curves are very close.

Theorems & Definitions (117)

  • Lemma 2.1
  • proof
  • Theorem 2.2
  • proof
  • Corollary 2.3
  • proof
  • Lemma 2.4
  • proof
  • Theorem 2.5
  • proof
  • ...and 107 more