Table of Contents
Fetching ...

Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits

Federico Cinus, Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi

TL;DR

The paper tackles online minimization of polarization and disagreement in Friedkin–Johnsen opinion dynamics when innate opinions are unknown, formulating it as regret minimization with scalar feedback after each intervention. It introduces a two-stage approach, OPD-Min-ESTR, that first estimates a low-dimensional opinion subspace via nuclear-norm regularization and then runs a linear bandit in a reduced space of dimension $k=2|V|-1$. The authors prove a regret bound of $\widetilde{O}(|V|\sqrt{T})$ under a Restricted Strong Convexity condition and validate the method experimentally against a full-dimensional baseline on synthetic and real networks, showing superior performance and efficiency. This work connects online social interventions to low-rank matrix bandits and provides a scalable framework for analyzing and mitigating polarization in large networks, with potential impact for platform interventions and policy design.

Abstract

We study the problem of minimizing polarization and disagreement in the Friedkin-Johnsen opinion dynamics model under incomplete information. Unlike prior work that assumes a static setting with full knowledge of users' innate opinions, we address the more realistic online setting where innate opinions are unknown and must be learned through sequential observations. This novel setting, which naturally mirrors periodic interventions on social media platforms, is formulated as a regret minimization problem, establishing a key connection between algorithmic interventions on social media platforms and theory of multi-armed bandits. In our formulation, a learner observes only a scalar feedback of the overall polarization and disagreement after an intervention. For this novel bandit problem, we propose a two-stage algorithm based on low-rank matrix bandits. The algorithm first performs subspace estimation to identify an underlying low-dimensional structure, and then employs a linear bandit algorithm within the compact dimensional representation derived from the estimated subspace. We prove that our algorithm achieves an $ \widetilde{O}(\sqrt{T}) $ cumulative regret over any time horizon $T$. Empirical results validate that our algorithm significantly outperforms a linear bandit baseline in terms of both cumulative regret and running time.

Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits

TL;DR

The paper tackles online minimization of polarization and disagreement in Friedkin–Johnsen opinion dynamics when innate opinions are unknown, formulating it as regret minimization with scalar feedback after each intervention. It introduces a two-stage approach, OPD-Min-ESTR, that first estimates a low-dimensional opinion subspace via nuclear-norm regularization and then runs a linear bandit in a reduced space of dimension . The authors prove a regret bound of under a Restricted Strong Convexity condition and validate the method experimentally against a full-dimensional baseline on synthetic and real networks, showing superior performance and efficiency. This work connects online social interventions to low-rank matrix bandits and provides a scalable framework for analyzing and mitigating polarization in large networks, with potential impact for platform interventions and policy design.

Abstract

We study the problem of minimizing polarization and disagreement in the Friedkin-Johnsen opinion dynamics model under incomplete information. Unlike prior work that assumes a static setting with full knowledge of users' innate opinions, we address the more realistic online setting where innate opinions are unknown and must be learned through sequential observations. This novel setting, which naturally mirrors periodic interventions on social media platforms, is formulated as a regret minimization problem, establishing a key connection between algorithmic interventions on social media platforms and theory of multi-armed bandits. In our formulation, a learner observes only a scalar feedback of the overall polarization and disagreement after an intervention. For this novel bandit problem, we propose a two-stage algorithm based on low-rank matrix bandits. The algorithm first performs subspace estimation to identify an underlying low-dimensional structure, and then employs a linear bandit algorithm within the compact dimensional representation derived from the estimated subspace. We prove that our algorithm achieves an cumulative regret over any time horizon . Empirical results validate that our algorithm significantly outperforms a linear bandit baseline in terms of both cumulative regret and running time.

Paper Structure

This paper contains 45 sections, 12 theorems, 99 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Let $\mathbf{\Theta}\xspace^* = \bm{s}\bm{s}^\top \in \mathbb{R}^{|V| \times |V|}$ be the true rank-one parameter matrix. Fix a confidence parameter $\delta\in(0,1)$. Define $\widehat{\mathbf{\Theta}\xspace}$ as any solution to the nuclear-norm regularized least squares problem eq:nuclear_norm_minim valid for $128 \tau^2_{T_1} \leq \kappa$.

Figures (4)

  • Figure 1: Cumulative regret for Erdős--Rényi graphs (top) and homophilic Stochastic Block Model graphs (bottom) with $|V| \in \{8,16\}$. Runtime (mean $\pm$ std) over 100 repetitions is reported in the legend. For ER graphs the edge probability is $p=0.2$. For SBM graphs, two communities are generated with sizes $|V_1| \approx 0.75|V|$, $|V_2|=|V|-|V_1|$, intra-community edge probability $p=0.5$, and inter-community probability $p=0.07$.
  • Figure 2: Wall-clock time of OPD-Min as a function of the number of nodes $|V|$ on Erdős–Rényi graphs. Shaded regions indicate standard deviation across trials.
  • Figure 3: Cumulative regret on Erdős--Rényi graphs with polarized innate opinions ($\texttt{pol}=3$). Runtime (mean $\pm$ std) over 100 repetitions is reported in the legend. Edge probability is $p=0.2$.
  • Figure 4: Cumulative regret over $10{,}000$ iterations on four benchmark social networks (Florentine families, Davis southern women, Karate club, and Les Misérables), under two noise levels ($\sigma=0.01$ and $\sigma=1.0$). The left column corresponds to action set size $|\mathcal{X}|=10$, while the right column corresponds to $|\mathcal{X}|=1000$. Each curve shows the mean regret across runs, with shaded regions indicating 95% confidence intervals.

Theorems & Definitions (23)

  • Definition 1: Polarization at equilibrium
  • Definition 2: Disagreement at equilibrium
  • Remark 1
  • Proposition 1: Estimation Error Bound
  • Theorem 4.1: Regret Bound for OPD-Min-ESTR
  • Corollary 1: Regret Bound with a Lower Bound on Signal Strength
  • Remark 2
  • Definition 3: Subspaces for $\mathbf{\Theta}\xspace^* = \bm{s}\bm{s}^\top$
  • Proposition 2: cf. Prop. 9.13 of wainwright2019high
  • Definition 4: RSC Condition Restricted to the Cone
  • ...and 13 more