Interacting Large Language Model Agents. Interpretable Models and Social Learning
Adit Jain, Vikram Krishnamurthy
TL;DR
The paper develops an interpretable, theory‑driven framework for interacting Large Language Model Agents (LLMAs) by blending Bayesian inference, microeconomic rationality, and stochastic control. It first models a single LLMA as a rationally inattentive Bayesian utility maximizer (RIBUM) and shows how to reconstruct interpretable utilities from black‑box LLMAs; then extends to sequences of LLMAs performing Bayesian social learning, revealing conditions under which information cascades and herding occur. To counteract undesirable herding, the authors formulate two stochastic control settings—central control and incentivized autonomous LLMAs—and prove threshold structures for optimal stopping policies, complemented by a policy‑gradient algorithm that operates without full model knowledge. Numerical experiments on hate speech detection and product quality identification with LLaMA and ChatGPT demonstrate the approach’s ability to yield interpretable state estimation and active control of information sharing. The work provides reproducible code and points to broad applications in finance, online moderation, and personalized recommendations, with future directions toward broader LLMA networks and human‑in‑the‑loop integration.
Abstract
This paper discusses the theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making involving interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from both prior decisions and external inputs, they can exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the necessary and sufficient conditions for rationally inattentive (bounded rationality) Bayesian utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our proposed models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under 2 settings: (a) centrally controlled LLMAs (b) autonomous LLMAs with incentives. We demonstrate the effectiveness of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like LLaMA and closed-source models like ChatGPT. The main takeaway of this paper, based on empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
