Table of Contents
Fetching ...

Role Similarity Metric Based on Spanning Rooted Forest

Qi Bao, Zhongzhi Zhang, Haibin Kan

TL;DR

ForestSim introduces a scalable, admissible role similarity metric based on spanning rooted forests, enabling efficient top-k similarity search on large networks. By expressing the metric solely in terms of diagonal forest-matrix entries, ForestSim achieves exact or near-linear precomputation using a fast diagonal approximation, and then answers top-k queries in $O(k)$ time. The method matches state-of-the-art effectiveness while outperforming existing approaches in scalability, thanks to ForestSim-AP's near-linear time and space requirements. These properties make ForestSim well-suited for million-scale networks and practical deployment in analytics pipelines.

Abstract

As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on large real-world networks due to the high time and space cost. In this paper, we propose a new role similarity metric, namely \textsf{ForestSim}. We prove that \textsf{ForestSim} is an admissible role similarity metric and devise the corresponding top-k similarity search algorithm, namely \textsf{ForestSimSearch}, which is able to process a top-k query in $O(k)$ time once the precomputation is finished. Moreover, we speed up the precomputation by using a fast approximate algorithm to compute the diagonal entries of the forest matrix, which reduces the time and space complexity of the precomputation to $O(ε^{-2}m\log^5{n}\log{\frac{1}ε})$ and $O(m\log^3{n})$, respectively. Finally, we conduct extensive experiments on 26 real-world networks. The results show that \textsf{ForestSim} works efficiently on million-scale networks and achieves comparable performance to the state-of-art methods.

Role Similarity Metric Based on Spanning Rooted Forest

TL;DR

ForestSim introduces a scalable, admissible role similarity metric based on spanning rooted forests, enabling efficient top-k similarity search on large networks. By expressing the metric solely in terms of diagonal forest-matrix entries, ForestSim achieves exact or near-linear precomputation using a fast diagonal approximation, and then answers top-k queries in time. The method matches state-of-the-art effectiveness while outperforming existing approaches in scalability, thanks to ForestSim-AP's near-linear time and space requirements. These properties make ForestSim well-suited for million-scale networks and practical deployment in analytics pipelines.

Abstract

As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on large real-world networks due to the high time and space cost. In this paper, we propose a new role similarity metric, namely \textsf{ForestSim}. We prove that \textsf{ForestSim} is an admissible role similarity metric and devise the corresponding top-k similarity search algorithm, namely \textsf{ForestSimSearch}, which is able to process a top-k query in time once the precomputation is finished. Moreover, we speed up the precomputation by using a fast approximate algorithm to compute the diagonal entries of the forest matrix, which reduces the time and space complexity of the precomputation to and , respectively. Finally, we conduct extensive experiments on 26 real-world networks. The results show that \textsf{ForestSim} works efficiently on million-scale networks and achieves comparable performance to the state-of-art methods.

Paper Structure

This paper contains 19 sections, 7 theorems, 23 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

lemma 1

In a graph $G=(V,E)$, let $\boldsymbol{\mathit{W}}$ be the forest matrix. Then, $w_{uu} \in \left[\frac{1}{d_u +1}, \frac{2}{d_u +2}\right]$ holds for any $u \in V$ZhXi11.

Figures (5)

  • Figure 1: The toy graph $G_0$ and its all $40$ spanning rooted forests. $16$ forests in $\mathcal{F}_{11}$ are marked in yellow.
  • Figure 2: Each tree rooted at $u$ in the spanning rooted forest $F \in \mathcal{F}_{uu}$ for $u= 1,2,3,4$ in the toy graph $G_0$. The size of each tree rooted at $u$, denoted by $|F|_u$, is labeled in the top right-hand corner of each picture.
  • Figure 3: Construction of the mapping $f: \mathcal{S}_u \rightarrow \mathcal{F}$
  • Figure 4: Precomputation of the studied role similarity metrics.
  • Figure 5: Average Precision@$K$ of the studied role similarity metrics on six real-world networks.

Theorems & Definitions (15)

  • definition 1
  • lemma 1
  • lemma 2
  • definition 2
  • definition 3
  • lemma 3
  • lemma 4
  • proof
  • theorem 1
  • proof
  • ...and 5 more