Table of Contents
Fetching ...

Steering the Herd: A Framework for LLM-based Control of Social Learning

Raghu Arghal, Kevin He, Shirin Saeedi Bidokhti, Saswati Sarkar

TL;DR

We study how an information-mediating planner, such as an LLM, can strategically control the precision of private signals in a sequential social-learning setting. The model embeds this control in a dynamic programming framework with Bayesian belief updates, analyzing altruistic versus biased planners and proving the convexity of the altruistic value function along with a structured, threshold-based policy characterization. The biased planner can, in some regimes, obfuscate signals to steer actions, with substantial welfare implications depending on alignment with agent goals. Empirical simulations using LLMs show that planners exhibit near-optimal strategic reasoning and emergent behavior consistent with the theory, while non-Bayesian agent biases can both distort learning and be mitigated by alignment-aware mediation. Overall, the work provides a tractable foundation for understanding and regulating LLM-based information mediators in social learning environments.

Abstract

Algorithms increasingly serve as information mediators--from social media feeds and targeted advertising to the increasing ubiquity of LLMs. This engenders a joint process where agents combine private, algorithmically-mediated signals with learning from peers to arrive at decisions. To study such settings, we introduce a model of controlled sequential social learning in which an information-mediating planner (e.g. an LLM) controls the information structure of agents while they also learn from the decisions of earlier agents. The planner may seek to improve social welfare (altruistic planner) or to induce a specific action the planner prefers (biased planner). Our framework presents a new optimization problem for social learning that combines dynamic programming with decentralized action choices and Bayesian belief updates. We prove the convexity of the value function and characterize the optimal policies of altruistic and biased planners, which attain desired tradeoffs between the costs they incur and the payoffs they earn from induced agent choices. Notably, in some regimes the biased planner intentionally obfuscates the agents' signals. Even under stringent transparency constraints--information parity with individuals, no lying or cherry-picking, and full observability--we show that information mediation can substantially shift social welfare in either direction. We complement our theory with simulations in which LLMs act as both planner and agents. Notably, the LLM planner in our simulations exhibits emergent strategic behavior in steering public opinion that broadly mirrors the trends predicted, though key deviations suggest the influence of non-Bayesian reasoning consistent with the cognitive patterns of both humans and LLMs trained on human-like data. Together, we establish our framework as a tractable basis for studying the impact and regulation of LLM information mediators.

Steering the Herd: A Framework for LLM-based Control of Social Learning

TL;DR

We study how an information-mediating planner, such as an LLM, can strategically control the precision of private signals in a sequential social-learning setting. The model embeds this control in a dynamic programming framework with Bayesian belief updates, analyzing altruistic versus biased planners and proving the convexity of the altruistic value function along with a structured, threshold-based policy characterization. The biased planner can, in some regimes, obfuscate signals to steer actions, with substantial welfare implications depending on alignment with agent goals. Empirical simulations using LLMs show that planners exhibit near-optimal strategic reasoning and emergent behavior consistent with the theory, while non-Bayesian agent biases can both distort learning and be mitigated by alignment-aware mediation. Overall, the work provides a tractable foundation for understanding and regulating LLM-based information mediators in social learning environments.

Abstract

Algorithms increasingly serve as information mediators--from social media feeds and targeted advertising to the increasing ubiquity of LLMs. This engenders a joint process where agents combine private, algorithmically-mediated signals with learning from peers to arrive at decisions. To study such settings, we introduce a model of controlled sequential social learning in which an information-mediating planner (e.g. an LLM) controls the information structure of agents while they also learn from the decisions of earlier agents. The planner may seek to improve social welfare (altruistic planner) or to induce a specific action the planner prefers (biased planner). Our framework presents a new optimization problem for social learning that combines dynamic programming with decentralized action choices and Bayesian belief updates. We prove the convexity of the value function and characterize the optimal policies of altruistic and biased planners, which attain desired tradeoffs between the costs they incur and the payoffs they earn from induced agent choices. Notably, in some regimes the biased planner intentionally obfuscates the agents' signals. Even under stringent transparency constraints--information parity with individuals, no lying or cherry-picking, and full observability--we show that information mediation can substantially shift social welfare in either direction. We complement our theory with simulations in which LLMs act as both planner and agents. Notably, the LLM planner in our simulations exhibits emergent strategic behavior in steering public opinion that broadly mirrors the trends predicted, though key deviations suggest the influence of non-Bayesian reasoning consistent with the cognitive patterns of both humans and LLMs trained on human-like data. Together, we establish our framework as a tractable basis for studying the impact and regulation of LLM information mediators.

Paper Structure

This paper contains 44 sections, 17 theorems, 85 equations, 5 figures, 1 table.

Key Result

Theorem 1

The optimal myopic altruistic policy $\pi_A^0$ is given as follows: $\pi^0_A(b) = $ where $t_M=$ Proof in Appendix sec:appendix_myopic_alt_proof.

Figures (5)

  • Figure 1: (a) A system diagram for our experiments. The instance parameters detail are color-coded according to which of the LLM roles uses them. (b) Here we show the change in LLM (solid) and Bayesian (dashed) agents' beliefs for a given prior after a positive (black) or negative (red) signal.
  • Figure 2: (a) Example policies from the LLM planner (black) and the analytically optimal planner (red) in altruistic (solid) and biased (dashed) settings. (b) A histogram showing the distribution of the percentage deviation between the LLM and optimal policies. (c) Planner expenditure and social welfare change as a percent of the no-control baseline welfare. The true state was fixed to B. The left half shows a biased planner seeking action G, and the right shows an altruistic planner.
  • Figure 3: Here we depict the belief evolution in the form of three binary trees. The tree begins with state $x_0,m_0$, and $n_0$, respectively, and level $i$ contains $2^i$ possible states reachable after $i$ epochs. Left branches correspond to action $B$ leading to updated belief $\underline x_0$ (respectively, $\underline m_0$ and $\underline n_0$). Right branches correspond to action $G$ leading to updated belief $\bar{x}_0$ (respectively, $\bar{m}_0$ and $\bar{n}_0$). Each branch is labeled with the corresponding probability, and the bold states indicate an example path, like $\lambda$. In the example path, $0\in\mathcal{G}$ and $1,k-1\in\mathcal{B}$. The bold states in each level of the three trees are counterparts.
  • Figure 4: The rate at which the LLM chose a 'buy' decision as a function of (a) its self-reported posterior belief (b) its self-reported posterior and the price buying.
  • Figure 5: Agent Belief Updating trajectories for the true Bayesian belief and the belief reported by the LLM agent under analytically optimal policies for different planner objectives and true states of the world.

Theorems & Definitions (34)

  • Remark 1
  • Remark 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • proof
  • Lemma 7
  • ...and 24 more