Table of Contents
Fetching ...

Two approaches to low-parametric SimRank computation

Egor P. Berezin, Robert T. Zaks, German Z. Alekhin, Stanislav V. Morozov, Sergey A. Matveev

TL;DR

This work discusses low-parametric approaches for approximating SimRank matrices, which estimate the similarity between pairs of nodes in a graph, and proposes two major formats for the economical embedding of target data.

Abstract

In this work, we discuss low-parametric approaches for approximating SimRank matrices, which estimate the similarity between pairs of nodes in a graph. Although SimRank matrices and their computation require a significant amount of memory, common approaches mostly address the problem of algorithmic complexity. We propose two major formats for the economical embedding of target data. The first approach adopts a non-symmetric form that can be computed using a specialized alternating optimization algorithm. The second is based on a symmetric representation and Newton-type iterations. We propose numerical implementations for both methodologies that avoid working with dense matrices and maintain low memory consumption. Furthermore, we study both types of embeddings numerically using real data from publicly available datasets. The results show that our algorithms yield a good approximation of the SimRank matrices, both in terms of the error norm (particularly the Chebyshev norm) and in preserving the average number of the most similar elements for each given node.

Two approaches to low-parametric SimRank computation

TL;DR

This work discusses low-parametric approaches for approximating SimRank matrices, which estimate the similarity between pairs of nodes in a graph, and proposes two major formats for the economical embedding of target data.

Abstract

In this work, we discuss low-parametric approaches for approximating SimRank matrices, which estimate the similarity between pairs of nodes in a graph. Although SimRank matrices and their computation require a significant amount of memory, common approaches mostly address the problem of algorithmic complexity. We propose two major formats for the economical embedding of target data. The first approach adopts a non-symmetric form that can be computed using a specialized alternating optimization algorithm. The second is based on a symmetric representation and Newton-type iterations. We propose numerical implementations for both methodologies that avoid working with dense matrices and maintain low memory consumption. Furthermore, we study both types of embeddings numerically using real data from publicly available datasets. The results show that our algorithms yield a good approximation of the SimRank matrices, both in terms of the error norm (particularly the Chebyshev norm) and in preserving the average number of the most similar elements for each given node.
Paper Structure (8 sections, 5 theorems, 48 equations, 3 figures, 1 table, 3 algorithms)

This paper contains 8 sections, 5 theorems, 48 equations, 3 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

For any real $S \in \mathbb{R}^{n \times n}$ and column stochastic $A \in \mathbb{R}^{n \times n}$

Figures (3)

  • Figure 1: Singular values of $S$ matrix and shift $S-I$ for email-Eu-core dataset with 1005 nodes in graph eumailsource and for ego-Facebook with 4039 nodes fbsource
  • Figure 2: Approximation error in Chebyshev norm for shift $S-I$ for email-Eu-core dataset with 1005 nodes in graph eumailsource and for ego-Facebook with 4039 nodes fbsource using truncated SVD and method from Zamarashkin2022
  • Figure 3: Chebyshev norm error and $\Psi(10)$ for ego-Facebook dataset fbsource (two plots on the left). Chebyshev norm error and $\Psi(10)$ for wiki-Votewikivotesource (two plots on the right).

Theorems & Definitions (10)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3: see also Lizorkin2008
  • proof
  • Definition 1
  • Theorem 1: Theorem 1.4 in Alon2013
  • Proposition 4
  • proof