Two approaches to low-parametric SimRank computation

Egor P. Berezin; Robert T. Zaks; German Z. Alekhin; Stanislav V. Morozov; Sergey A. Matveev

Two approaches to low-parametric SimRank computation

Egor P. Berezin, Robert T. Zaks, German Z. Alekhin, Stanislav V. Morozov, Sergey A. Matveev

TL;DR

This work discusses low-parametric approaches for approximating SimRank matrices, which estimate the similarity between pairs of nodes in a graph, and proposes two major formats for the economical embedding of target data.

Abstract

In this work, we discuss low-parametric approaches for approximating SimRank matrices, which estimate the similarity between pairs of nodes in a graph. Although SimRank matrices and their computation require a significant amount of memory, common approaches mostly address the problem of algorithmic complexity. We propose two major formats for the economical embedding of target data. The first approach adopts a non-symmetric form that can be computed using a specialized alternating optimization algorithm. The second is based on a symmetric representation and Newton-type iterations. We propose numerical implementations for both methodologies that avoid working with dense matrices and maintain low memory consumption. Furthermore, we study both types of embeddings numerically using real data from publicly available datasets. The results show that our algorithms yield a good approximation of the SimRank matrices, both in terms of the error norm (particularly the Chebyshev norm) and in preserving the average number of the most similar elements for each given node.

Two approaches to low-parametric SimRank computation

TL;DR

Abstract

Paper Structure (8 sections, 5 theorems, 48 equations, 3 figures, 1 table, 3 algorithms)

This paper contains 8 sections, 5 theorems, 48 equations, 3 figures, 1 table, 3 algorithms.

Introduction
Fixed-point iteration method
Inspiration for the low-rank approach
Two approaches to low-parametric solution
Alternating minimization
Quadratic minimization
Numerical results
Conclusion and future work

Key Result

Proposition 1

For any real $S \in \mathbb{R}^{n \times n}$ and column stochastic $A \in \mathbb{R}^{n \times n}$

Figures (3)

Figure 1: Singular values of $S$ matrix and shift $S-I$ for email-Eu-core dataset with 1005 nodes in graph eumailsource and for ego-Facebook with 4039 nodes fbsource
Figure 2: Approximation error in Chebyshev norm for shift $S-I$ for email-Eu-core dataset with 1005 nodes in graph eumailsource and for ego-Facebook with 4039 nodes fbsource using truncated SVD and method from Zamarashkin2022
Figure 3: Chebyshev norm error and $\Psi(10)$ for ego-Facebook dataset fbsource (two plots on the left). Chebyshev norm error and $\Psi(10)$ for wiki-Votewikivotesource (two plots on the right).

Theorems & Definitions (10)

Proposition 1
proof
Proposition 2
proof
Proposition 3: see also Lizorkin2008
proof
Definition 1
Theorem 1: Theorem 1.4 in Alon2013
Proposition 4
proof

Two approaches to low-parametric SimRank computation

TL;DR

Abstract

Two approaches to low-parametric SimRank computation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)