Table of Contents
Fetching ...

Sublinear-Time Opinion Estimation in the Friedkin--Johnsen Model

Stefan Neumann, Yinhao Dong, Pan Peng

TL;DR

The paper addresses estimating node opinions and standard measures in the Friedkin–Johnsen model when access to the entire network is impractical. It develops sublinear-time estimators by leveraging query access to the graph and node opinions, notably using a Laplacian-based linear solver with short random walks and a personalized PageRank connection; in $d$-regular graphs, a deterministic, locality-preserving PageRank-based estimator achieves $z_u^*$ with neighborhood-sized independence from $n$. The main contributions are sublinear algorithms for estimating $z^*$ and key measures (e.g., polarization $\mathcal{P}$, disagreement $\mathcal{D}$) under innate or expressed opinion oracles, plus extensive experiments showing accurate, fast performance on large real-world graphs. The practical impact is enabling scalable, data-access-constrained analysis of opinion dynamics on large online networks, with publicly available implementations. The results also provide insights into the locality of FJ dynamics in sparse, bounded-degree graphs and demonstrate effective page-rank-inspired techniques for social-network analysis.

Abstract

Online social networks are ubiquitous parts of modern societies and the discussions that take place in these networks impact people's opinions on diverse topics, such as politics or vaccination. One of the most popular models to formally describe this opinion formation process is the Friedkin--Johnsen (FJ) model, which allows to define measures, such as the polarization and the disagreement of a network. Recently, Xu, Bao and Zhang (WebConf'21) showed that all opinions and relevant measures in the FJ model can be approximated in near-linear time. However, their algorithm requires the entire network and the opinions of all nodes as input. Given the sheer size of online social networks and increasing data-access limitations, obtaining the entirety of this data might, however, be unrealistic in practice. In this paper, we show that node opinions and all relevant measures, like polarization and disagreement, can be efficiently approximated in time that is sublinear in the size of the network. Particularly, our algorithms only require query-access to the network and do not have to preprocess the graph. Furthermore, we use a connection between FJ opinion dynamics and personalized PageRank, and show that in $d$-regular graphs, we can deterministically approximate each node's opinion by only looking at a constant-size neighborhood, independently of the network size. We also experimentally validate that our estimation algorithms perform well in practice.

Sublinear-Time Opinion Estimation in the Friedkin--Johnsen Model

TL;DR

The paper addresses estimating node opinions and standard measures in the Friedkin–Johnsen model when access to the entire network is impractical. It develops sublinear-time estimators by leveraging query access to the graph and node opinions, notably using a Laplacian-based linear solver with short random walks and a personalized PageRank connection; in -regular graphs, a deterministic, locality-preserving PageRank-based estimator achieves with neighborhood-sized independence from . The main contributions are sublinear algorithms for estimating and key measures (e.g., polarization , disagreement ) under innate or expressed opinion oracles, plus extensive experiments showing accurate, fast performance on large real-world graphs. The practical impact is enabling scalable, data-access-constrained analysis of opinion dynamics on large online networks, with publicly available implementations. The results also provide insights into the locality of FJ dynamics in sparse, bounded-degree graphs and demonstrate effective page-rank-inspired techniques for social-network analysis.

Abstract

Online social networks are ubiquitous parts of modern societies and the discussions that take place in these networks impact people's opinions on diverse topics, such as politics or vaccination. One of the most popular models to formally describe this opinion formation process is the Friedkin--Johnsen (FJ) model, which allows to define measures, such as the polarization and the disagreement of a network. Recently, Xu, Bao and Zhang (WebConf'21) showed that all opinions and relevant measures in the FJ model can be approximated in near-linear time. However, their algorithm requires the entire network and the opinions of all nodes as input. Given the sheer size of online social networks and increasing data-access limitations, obtaining the entirety of this data might, however, be unrealistic in practice. In this paper, we show that node opinions and all relevant measures, like polarization and disagreement, can be efficiently approximated in time that is sublinear in the size of the network. Particularly, our algorithms only require query-access to the network and do not have to preprocess the graph. Furthermore, we use a connection between FJ opinion dynamics and personalized PageRank, and show that in -regular graphs, we can deterministically approximate each node's opinion by only looking at a constant-size neighborhood, independently of the network size. We also experimentally validate that our estimation algorithms perform well in practice.
Paper Structure (33 sections, 20 theorems, 30 equations, 7 figures, 9 tables, 4 algorithms)

This paper contains 33 sections, 20 theorems, 30 equations, 7 figures, 9 tables, 4 algorithms.

Key Result

Proposition 1

Let $u\in V$ and $\epsilon>0$. Let $\bar{\kappa}$ be an upper bound on $\kappa(\tilde{S})$ with $\tilde{S} = (I+D)^{-1/2} (I+L) (I+D)^{-1/2}$. Algorithm alg:random-walks returns a value $\tilde{z}_u^*$ such that $\left\lvert\tilde{z}_u^* - z_u^*\right\rvert \leq \epsilon$ with probability $1-\frac{1

Figures (7)

  • Figure 1: Absolute error when estimating expressed opinions $z_u^*$ using an oracle for innate opinions $s_u$ via Algorithm \ref{['alg:random-walks']}. We report means and standard deviations across 10 experiments. Figure \ref{['fig:estimating-opinions-oracle-innate']}(\ref{['fig:pokec-steps']}) and Figure \ref{['fig:estimating-opinions-oracle-innate']}(\ref{['fig:livejournal-steps']}) use 4000 walks and vary the number of steps; Figure \ref{['fig:estimating-opinions-oracle-innate']}(\ref{['fig:pokec-walks']}) and Figure \ref{['fig:estimating-opinions-oracle-innate']}(\ref{['fig:livejournal-walks']}) use 600 steps and vary the number of walks. Innate opinions were generated using the uniform distribution.
  • Figure 2: Running time of Algorithm \ref{['alg:random-walks']} for estimating expressed opinions $z_u^*$ using an oracle for innate opinions $s_u$. When not mentioned otherwise, we sampled 10000 vertices and for each of them we performed 4000 random walks with 600 steps. We report means and standard deviations across 10 experiments. Innate opinions were generated using the uniform distribution.
  • Figure 3: Normalized error of Algorithm \ref{['alg:random-walks']} for estimating expressed opinions $z_u^*$ using an oracle for innate opinions $s_u$. We sorted the vertices by degrees (from low to high) and partitioned them into 20 equally sized buckets. We used Algorithm \ref{['alg:random-walks']} with 4000 random walks with 600 steps. We report means and standard deviations across 10 experiments. Innate opinions were generated using the uniform distribution.
  • Figure 4: Normalized running time of Algorithm \ref{['alg:random-walks']} for estimating expressed opinions $z_u^*$ using an oracle for innate opinions $s_u$. We sorted the vertices by degrees (from low to high) and partitioned them into 20 equally sized buckets. We used Algorithm \ref{['alg:random-walks']} with 4000 random walks with 600 steps. We report means and standard deviations across 10 experiments. Innate opinions were generated using the uniform distribution.
  • Figure 5: Error of applying the PageRank-style update rule from Proposition \ref{['prop:page-rank-equation']} for multiple iterations. We compare the error of this iterative algorithm against the solution computed by the algorithm of Xu et al. xu2021fast. We report the $\ell_2$-norm of the error, as well as the differences of the iterates $\lVert z^{(t)} - z^{(t-1)}\rVert_2$. Innate opinions were generated using the uniform distribution.
  • ...and 2 more figures

Theorems & Definitions (20)

  • Proposition 1
  • Proposition 2: friedkin2014twoproskurnikov2016pagerank
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Corollary 7
  • Lemma 8
  • Proposition 9
  • Corollary 10
  • ...and 10 more