Table of Contents
Fetching ...

Estimation of Graph Features Based on Random Walks Using Neighbors' Properties

Tsuyoshi Hasegawa, Shiori Hironaka, Kazuyuki Shudo

TL;DR

The paper tackles estimating features of large, unknown directed social networks when API query budgets constrain neighbor acquisition. It introduces a probabilistic adjacent-node sampling scheme within a random-walk framework, parameterized by $\alpha$, and formalizes the process as a Markov chain on an expanded state space. By reweighting sampled data with $w(e_{ij})=\frac{1}{d_{\mathrm{sum}}(v_j)}$ and applying $g(e_{ij})=f(v_j)$, the method achieves unbiased feature estimates that converge to the uniform expectation as sampling progresses. Empirical results on real and synthetic graphs show the proposed approach outperforms established methods, with accuracy improving as $\alpha\to 1$ and higher query budgets yield better estimates, demonstrating practical gains in cost-constrained OSN feature estimation. The work provides a principled, scalable way to exploit adjacent-node information to reduce API costs while maintaining estimation quality in directed networks.

Abstract

Using random walks for sampling has proven advantageous in assessing the characteristics of large and unknown social networks. Several algorithms based on random walks have been introduced in recent years. In the practical application of social network sampling, there is a recurrent reliance on an application programming interface (API) for obtaining adjacent nodes. However, owing to constraints related to query frequency and associated API expenses, it is preferable to minimize API calls during the feature estimation process. In this study, considering the acquisition of neighboring nodes as a cost factor, we introduce a feature estimation algorithm that outperforms existing algorithms in terms of accuracy. Through experiments that simulate sampling on known graphs, we demonstrate the superior accuracy of our proposed algorithm when compared to existing alternatives.

Estimation of Graph Features Based on Random Walks Using Neighbors' Properties

TL;DR

The paper tackles estimating features of large, unknown directed social networks when API query budgets constrain neighbor acquisition. It introduces a probabilistic adjacent-node sampling scheme within a random-walk framework, parameterized by , and formalizes the process as a Markov chain on an expanded state space. By reweighting sampled data with and applying , the method achieves unbiased feature estimates that converge to the uniform expectation as sampling progresses. Empirical results on real and synthetic graphs show the proposed approach outperforms established methods, with accuracy improving as and higher query budgets yield better estimates, demonstrating practical gains in cost-constrained OSN feature estimation. The work provides a principled, scalable way to exploit adjacent-node information to reduce API costs while maintaining estimation quality in directed networks.

Abstract

Using random walks for sampling has proven advantageous in assessing the characteristics of large and unknown social networks. Several algorithms based on random walks have been introduced in recent years. In the practical application of social network sampling, there is a recurrent reliance on an application programming interface (API) for obtaining adjacent nodes. However, owing to constraints related to query frequency and associated API expenses, it is preferable to minimize API calls during the feature estimation process. In this study, considering the acquisition of neighboring nodes as a cost factor, we introduce a feature estimation algorithm that outperforms existing algorithms in terms of accuracy. Through experiments that simulate sampling on known graphs, we demonstrate the superior accuracy of our proposed algorithm when compared to existing alternatives.
Paper Structure (19 sections, 10 theorems, 19 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 10 theorems, 19 equations, 8 figures, 1 table, 1 algorithm.

Key Result

THEOREM 1

In the context of a distribution $\boldsymbol{\pi}=(\pi_i)_{i\in S}$, if the condition $\pi_j = \sum_{i\in S}\pi_i P_{i,j}$ is satisfied, it indicates that the distribution $\boldsymbol{\pi}$ serves as the steady-state distribution for a Markov chain governed by the probability transition matrix $\m

Figures (8)

  • Figure 1: Overview of proposed method, while gray nodes denote nodes capable of acquiring degree information and properties.
  • Figure 2: Average NRMSE for each feature categorized by query rate at each $\alpha$.
  • Figure 3: NRMSE for out-degree estimation.
  • Figure 4: NRMSE for random label estimation.
  • Figure 5: NRMSE for high degree label estimation.
  • ...and 3 more figures

Theorems & Definitions (26)

  • THEOREM 1
  • THEOREM 2
  • THEOREM 3
  • DEFINITION 4
  • THEOREM 5
  • proof
  • DEFINITION 6
  • DEFINITION 7
  • DEFINITION 8
  • DEFINITION 9
  • ...and 16 more