Sample Complexity of Linear Regression Models for Opinion Formation in Networks

Haolin Liu; Rajmohan Rajaraman; Ravi Sundaram; Anil Vullikanti; Omer Wasim; Haifeng Xu

Sample Complexity of Linear Regression Models for Opinion Formation in Networks

Haolin Liu, Rajmohan Rajaraman, Ravi Sundaram, Anil Vullikanti, Omer Wasim, Haifeng Xu

TL;DR

This work studies the sample complexity needed for opinion convergence on networks by treating each agent's opinion as a data-derived model and analyzing the Nash equilibrium of an opinion formation game. It derives a concrete linear-regression-based framework where the equilibrium is given by $\boldsymbol{\theta}^{eq}=W^{-1}\boldsymbol{\bar{\theta}}$ with $W=\mathcal{L}+I$, and defines the total sample complexity $TSC(G,\pmb{v},k,\epsilon)$ as the minimal total samples ensuring $L(\theta^{eq}_i)\le\epsilon$ for all agents. The paper proves a near-tight bound $TSC(G,\pmb{v},k,\epsilon)=\Theta(\sum_i m_i^* + nk)$ via a convex optimization formulation, and provides explicit degree-based sampling rules $m_i^* \propto \frac{1}{\alpha d_i+1}$ for uniform influence and more general bounds for arbitrary influence factors using matrices $W^{-1}$, $B$ and vectors $\gamma$. It characterizes network gain across graph classes (clique, star, hypercube, random $d$-regular, and expanders) and shows substantial gains in well-connected networks, while highlighting inverse-degree sampling and the effect of edge expansion. Empirical results on synthetic and real networks corroborate the theory and demonstrate the practical utility of the proposed optimization approach (SOCP) for allocating samples to minimize equilibrium generalization error. Overall, the work connects opinion dynamics, graph signal processing, and decentralized learning to provide actionable guidance on allocating data to achieve accurate, network-wide opinion models with limited resources.

Abstract

Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, so as to avoid polarization and ensure community-wide convergence of opinion? To tackle such challenges, we initiate the study of sample complexity of opinion convergence in networks. Our framework is built on the recognized opinion formation game, where we regard the opinion of each agent as a data-derived model, unlike previous works that treat opinions as data-independent scalars. The opinion model for every agent is initially learned from its local samples and evolves game-theoretically as all agents communicate with neighbors and revise their models towards an equilibrium. Our focus is on the sample complexity needed to ensure that the opinions converge to an equilibrium such that the final model of every agent has low generalization error. Our paper has two main technical results. First, we present a novel polynomial time optimization framework to quantify the total sample complexity for arbitrary networks, when the underlying learning problem is (generalized) linear regression. Second, we leverage this optimization to study the network gain which measures the improvement of sample complexity when learning over a network compared to that in isolation. Towards this end, we derive network gain bounds for various network classes including cliques, star graphs, and random regular graphs. Additionally, our framework provides a method to study sample distribution within the network, suggesting that it is sufficient to allocate samples inversely to the degree. Empirical results on both synthetic and real-world networks strongly support our theoretical findings.

Sample Complexity of Linear Regression Models for Opinion Formation in Networks

TL;DR

with

, and defines the total sample complexity

as the minimal total samples ensuring

for all agents. The paper proves a near-tight bound

via a convex optimization formulation, and provides explicit degree-based sampling rules

for uniform influence and more general bounds for arbitrary influence factors using matrices

and vectors

. It characterizes network gain across graph classes (clique, star, hypercube, random

-regular, and expanders) and shows substantial gains in well-connected networks, while highlighting inverse-degree sampling and the effect of edge expansion. Empirical results on synthetic and real networks corroborate the theory and demonstrate the practical utility of the proposed optimization approach (SOCP) for allocating samples to minimize equilibrium generalization error. Overall, the work connects opinion dynamics, graph signal processing, and decentralized learning to provide actionable guidance on allocating data to achieve accurate, network-wide opinion models with limited resources.

Abstract

Paper Structure (27 sections, 18 theorems, 79 equations, 13 figures, 5 tables)

This paper contains 27 sections, 18 theorems, 79 equations, 13 figures, 5 tables.

Introduction
Problem formulation
Overview of results
Related Work and Comparisons
The Opinion Formation Game
Total Sample Complexity of Opinion Formation
Derivation of error bounds
Total sample complexity
Network Effects on Total Sample Complexity
Uniform influence factors
General influence factors
Experiments
Discussion and future work
Additional details on preliminaries
Missing proof in Section \ref{['sec:problem formulation']}
...and 12 more sections

Key Result

Lemma 1

The unique Nash equilibrium $\pmb{\theta^{eq}} = (\theta_1^{eq}, \cdots, \theta_n^{eq})^T$ of the above game is $\pmb{\theta^{eq}} = W^{-1}\pmb{\Bar{\theta}}$ where $W_{ij} = \left\{ \right.$ and $\pmb{\Bar{\theta}} = (\Bar{\theta}_1, \cdots, \Bar{\theta}_n)$. When all $v_{ij} = \alpha \ge 0$, $W_{

Figures (13)

Figure 1: Relationship between sample distribution and degree for different networks when $\alpha = 1$. The x-axis is node degree and the y-axis is the average or variance of $N^d = \{\frac{\epsilon m_i^*}{k}: d_i = d, i \in [n]\}$ where $m_i^*, \forall i \in [n]$ is the solution of Equation \ref{['eq:opt']}.
Figure 2: Bound tightness for synthetic networks for random $v_{ij}$s. The x-axis is the relative errors of upper/lower bounds and the y-axis is the frequency.
Figure 3: Network Gain of RR. The x-axis is the square of degree and the y-axis is a lower bound of network gain.
Figure 4: Relationship between average samples and degree of SF for different $\alpha$.
Figure 5: Relationship between sample variance and degree of SF for different $\alpha$.
...and 8 more figures

Theorems & Definitions (37)

Lemma 1: Nash equilibrium of opinion formation
Theorem 2: Theorem 1, Proposition 2 and Equation 17 in mourtada2022exact
Theorem 3: Bound on generalization error
Theorem 4: Bounds on $TSC$
Theorem 5: Sample allocation and degree distribution
Corollary 6
Theorem 7: TSC under general influence factors
Lemma 8
proof
proof
...and 27 more

Sample Complexity of Linear Regression Models for Opinion Formation in Networks

TL;DR

Abstract

Sample Complexity of Linear Regression Models for Opinion Formation in Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (37)