Strategic Tradeoffs Between Humans and AI in Multi-Agent Bargaining
Crystal Qian, Kehang Zhu, John Horton, Benjamin S. Manning, Vivian Tsai, James Wexler, Nithum Thain
TL;DR
This work investigates how humans, LLM-based agents, and Bayesian models perform in dynamic, multi-agent bargaining under identical conditions, using a novel three-player chip-trading game and a Pareto-optimal bound computed via linear programming. The empirical study shows that Bayesian agents maximize surplus but lack social adaptability, while humans rely on fairness norms and LLMs adopt concessionary, high-acceptance strategies; overall, similar surplus can mask divergent decision processes and alignment. These findings emphasize that practical deployment of AI in real-world coordination must consider process, norms, and social dynamics, not just aggregate outcomes, and establish a baseline for evaluating more complex, variable-rich environments. The work also highlights potential in hybrid architectures that combine planning and social reasoning to improve foresight and cooperative behavior. The results provide a framework for assessing alignment and risk in human-AI negotiation settings and motivate future studies with richer environments and communication channels.
Abstract
As large language models (LLMs) are increasingly embedded in collaborative human activities such as business negotiations and group coordination, it becomes critical to evaluate both the performance gains they can achieve and how they interact in dynamic, multi-agent environments. Unlike traditional statistical agents such as Bayesian models, which may excel under well-specified conditions, large language models (LLMs) can generalize across diverse, real-world scenarios, raising new questions about how their strategies and behaviors compare to those of humans and other agent types. In this work, we compare outcomes and behavioral dynamics across humans (N = 216), LLMs (GPT-4o, Gemini 1.5 Pro), and Bayesian agents in a dynamic negotiation setting under identical conditions. Bayesian agents extract the highest surplus through aggressive optimization, at the cost of frequent trade rejections. Humans and LLMs achieve similar overall surplus, but through distinct behaviors: LLMs favor conservative, concessionary trades with few rejections, while humans employ more strategic, risk-taking, and fairness-oriented behaviors. Thus, we find that performance parity -- a common benchmark in agent evaluation -- can conceal fundamental differences in process and alignment, which are critical for practical deployment in real-world coordination tasks. By establishing foundational behavioral baselines under matched conditions, this work provides a baseline for future studies in more applied, variable-rich environments.
