Fast Approximate CoSimRanks via Random Projections

Renchi Yang; Xiaokui Xiao

Fast Approximate CoSimRanks via Random Projections

Renchi Yang, Xiaokui Xiao

TL;DR

This work tackles the expensive problem of all-pairs CoSimRank computation by introducing RPCS, a randomized algorithm that projects the $n\times n$ random-walk matrix into $n\times d$ using Johnson-Lindenstrauss-type random projections. By iteratively accumulating low-rank factors via updates $\mathbf{H}^{(k)}=\sqrt{c}\,\mathbf{P}\mathbf{H}^{(k-1)}$ and $\widehat{\mathbf{S}}=\mathbf{I}+\sum_k \mathbf{H}^{(k)}(\mathbf{H}^{(k)})^{\top}$, RPCS achieves an $ε$-accurate approximation for all entries with high probability, while reducing the per-iteration cost to $O(n^2d)$ and overall time to $O\left(\min\left\{\frac{n^2\ln n}{ε^2}\ln\frac{1}{ε},\,n^3\ln\frac{1}{ε}\right\}\right)$. The method is backed by a theoretical error bound via inner-product preservation and a procedure to select a projection dimension $d$ and parameter $δ$ to optimize runtime. Empirical results on six real graphs demonstrate substantial speedups over state-of-the-art methods, enabling ε-approximate all-pairs CoSimRank queries on large datasets such as a million-edge Twitter graph on a single commodity server.

Abstract

Given a graph $G$ with $n$ nodes and two nodes $u,v\in G$, the {\em CoSimRank} value $s(u,v)$ quantifies the similarity between $u$ and $v$ based on graph topology. Compared to SimRank, CoSimRank is shown to be more accurate and effective in many real-world applications, including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all pairwise CoSimRanks in $G$ is highly expensive and challenging. Existing solutions all focus on devising approximate algorithms for the computation of all pairwise CoSimRanks. To attain a desired absolute accuracy guarantee $ε$, the state-of-the-art approximate algorithm for computing all pairwise CoSimRanks requires $O(n^3\log_2(\ln(\frac{1}ε)))$ time, which is prohibitively expensive even though $ε$ is large. In this paper, we propose \rsim, a fast randomized algorithm for computing all pairwise CoSimRank values. The basic idea of \rsim is to approximate the $n\times n$ matrix multiplications in CoSimRank computation via random projection. Theoretically, \rsim runs in $O(\frac{n^2\ln(n)}{ε^2}\ln(\frac{1}ε))$ time and meanwhile ensures an absolute error of at most $ε$ in each CoSimRank value in $G$ with a high probability. Extensive experiments using six real graphs demonstrate that \rsim is more than orders of magnitude faster than the state of the art. In particular, on a million-edge Twitter graph, \rsim answers the $ε$-approximate ($ε=0.1$) all pairwise CoSimRank query within 4 hours, using a single commodity server, while existing solutions fail to terminate within a day.

Fast Approximate CoSimRanks via Random Projections

TL;DR

This work tackles the expensive problem of all-pairs CoSimRank computation by introducing RPCS, a randomized algorithm that projects the

random-walk matrix into

using Johnson-Lindenstrauss-type random projections. By iteratively accumulating low-rank factors via updates

and

, RPCS achieves an

-accurate approximation for all entries with high probability, while reducing the per-iteration cost to

and overall time to

. The method is backed by a theoretical error bound via inner-product preservation and a procedure to select a projection dimension

and parameter

to optimize runtime. Empirical results on six real graphs demonstrate substantial speedups over state-of-the-art methods, enabling ε-approximate all-pairs CoSimRank queries on large datasets such as a million-edge Twitter graph on a single commodity server.

Abstract

Given a graph

with

nodes and two nodes

, the {\em CoSimRank} value

quantifies the similarity between

and

based on graph topology. Compared to SimRank, CoSimRank is shown to be more accurate and effective in many real-world applications, including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all pairwise CoSimRanks in

is highly expensive and challenging. Existing solutions all focus on devising approximate algorithms for the computation of all pairwise CoSimRanks. To attain a desired absolute accuracy guarantee

, the state-of-the-art approximate algorithm for computing all pairwise CoSimRanks requires

time, which is prohibitively expensive even though

is large. In this paper, we propose \rsim, a fast randomized algorithm for computing all pairwise CoSimRank values. The basic idea of \rsim is to approximate the

matrix multiplications in CoSimRank computation via random projection. Theoretically, \rsim runs in

time and meanwhile ensures an absolute error of at most

in each CoSimRank value in

with a high probability. Extensive experiments using six real graphs demonstrate that \rsim is more than orders of magnitude faster than the state of the art. In particular, on a million-edge Twitter graph, \rsim answers the

-approximate (

) all pairwise CoSimRank query within 4 hours, using a single commodity server, while existing solutions fail to terminate within a day.

Fast Approximate CoSimRanks via Random Projections

TL;DR

Abstract

Fast Approximate CoSimRanks via Random Projections

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (8)