Sample Complexity of Linear Regression Models for Opinion Formation in Networks
Haolin Liu, Rajmohan Rajaraman, Ravi Sundaram, Anil Vullikanti, Omer Wasim, Haifeng Xu
TL;DR
This work studies the sample complexity needed for opinion convergence on networks by treating each agent's opinion as a data-derived model and analyzing the Nash equilibrium of an opinion formation game. It derives a concrete linear-regression-based framework where the equilibrium is given by $\boldsymbol{\theta}^{eq}=W^{-1}\boldsymbol{\bar{\theta}}$ with $W=\mathcal{L}+I$, and defines the total sample complexity $TSC(G,\pmb{v},k,\epsilon)$ as the minimal total samples ensuring $L(\theta^{eq}_i)\le\epsilon$ for all agents. The paper proves a near-tight bound $TSC(G,\pmb{v},k,\epsilon)=\Theta(\sum_i m_i^* + nk)$ via a convex optimization formulation, and provides explicit degree-based sampling rules $m_i^* \propto \frac{1}{\alpha d_i+1}$ for uniform influence and more general bounds for arbitrary influence factors using matrices $W^{-1}$, $B$ and vectors $\gamma$. It characterizes network gain across graph classes (clique, star, hypercube, random $d$-regular, and expanders) and shows substantial gains in well-connected networks, while highlighting inverse-degree sampling and the effect of edge expansion. Empirical results on synthetic and real networks corroborate the theory and demonstrate the practical utility of the proposed optimization approach (SOCP) for allocating samples to minimize equilibrium generalization error. Overall, the work connects opinion dynamics, graph signal processing, and decentralized learning to provide actionable guidance on allocating data to achieve accurate, network-wide opinion models with limited resources.
Abstract
Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, so as to avoid polarization and ensure community-wide convergence of opinion? To tackle such challenges, we initiate the study of sample complexity of opinion convergence in networks. Our framework is built on the recognized opinion formation game, where we regard the opinion of each agent as a data-derived model, unlike previous works that treat opinions as data-independent scalars. The opinion model for every agent is initially learned from its local samples and evolves game-theoretically as all agents communicate with neighbors and revise their models towards an equilibrium. Our focus is on the sample complexity needed to ensure that the opinions converge to an equilibrium such that the final model of every agent has low generalization error. Our paper has two main technical results. First, we present a novel polynomial time optimization framework to quantify the total sample complexity for arbitrary networks, when the underlying learning problem is (generalized) linear regression. Second, we leverage this optimization to study the network gain which measures the improvement of sample complexity when learning over a network compared to that in isolation. Towards this end, we derive network gain bounds for various network classes including cliques, star graphs, and random regular graphs. Additionally, our framework provides a method to study sample distribution within the network, suggesting that it is sufficient to allocate samples inversely to the degree. Empirical results on both synthetic and real-world networks strongly support our theoretical findings.
