Table of Contents
Fetching ...

Strategic Tradeoffs Between Humans and AI in Multi-Agent Bargaining

Crystal Qian, Kehang Zhu, John Horton, Benjamin S. Manning, Vivian Tsai, James Wexler, Nithum Thain

TL;DR

This work investigates how humans, LLM-based agents, and Bayesian models perform in dynamic, multi-agent bargaining under identical conditions, using a novel three-player chip-trading game and a Pareto-optimal bound computed via linear programming. The empirical study shows that Bayesian agents maximize surplus but lack social adaptability, while humans rely on fairness norms and LLMs adopt concessionary, high-acceptance strategies; overall, similar surplus can mask divergent decision processes and alignment. These findings emphasize that practical deployment of AI in real-world coordination must consider process, norms, and social dynamics, not just aggregate outcomes, and establish a baseline for evaluating more complex, variable-rich environments. The work also highlights potential in hybrid architectures that combine planning and social reasoning to improve foresight and cooperative behavior. The results provide a framework for assessing alignment and risk in human-AI negotiation settings and motivate future studies with richer environments and communication channels.

Abstract

As large language models (LLMs) are increasingly embedded in collaborative human activities such as business negotiations and group coordination, it becomes critical to evaluate both the performance gains they can achieve and how they interact in dynamic, multi-agent environments. Unlike traditional statistical agents such as Bayesian models, which may excel under well-specified conditions, large language models (LLMs) can generalize across diverse, real-world scenarios, raising new questions about how their strategies and behaviors compare to those of humans and other agent types. In this work, we compare outcomes and behavioral dynamics across humans (N = 216), LLMs (GPT-4o, Gemini 1.5 Pro), and Bayesian agents in a dynamic negotiation setting under identical conditions. Bayesian agents extract the highest surplus through aggressive optimization, at the cost of frequent trade rejections. Humans and LLMs achieve similar overall surplus, but through distinct behaviors: LLMs favor conservative, concessionary trades with few rejections, while humans employ more strategic, risk-taking, and fairness-oriented behaviors. Thus, we find that performance parity -- a common benchmark in agent evaluation -- can conceal fundamental differences in process and alignment, which are critical for practical deployment in real-world coordination tasks. By establishing foundational behavioral baselines under matched conditions, this work provides a baseline for future studies in more applied, variable-rich environments.

Strategic Tradeoffs Between Humans and AI in Multi-Agent Bargaining

TL;DR

This work investigates how humans, LLM-based agents, and Bayesian models perform in dynamic, multi-agent bargaining under identical conditions, using a novel three-player chip-trading game and a Pareto-optimal bound computed via linear programming. The empirical study shows that Bayesian agents maximize surplus but lack social adaptability, while humans rely on fairness norms and LLMs adopt concessionary, high-acceptance strategies; overall, similar surplus can mask divergent decision processes and alignment. These findings emphasize that practical deployment of AI in real-world coordination must consider process, norms, and social dynamics, not just aggregate outcomes, and establish a baseline for evaluating more complex, variable-rich environments. The work also highlights potential in hybrid architectures that combine planning and social reasoning to improve foresight and cooperative behavior. The results provide a framework for assessing alignment and risk in human-AI negotiation settings and motivate future studies with richer environments and communication channels.

Abstract

As large language models (LLMs) are increasingly embedded in collaborative human activities such as business negotiations and group coordination, it becomes critical to evaluate both the performance gains they can achieve and how they interact in dynamic, multi-agent environments. Unlike traditional statistical agents such as Bayesian models, which may excel under well-specified conditions, large language models (LLMs) can generalize across diverse, real-world scenarios, raising new questions about how their strategies and behaviors compare to those of humans and other agent types. In this work, we compare outcomes and behavioral dynamics across humans (N = 216), LLMs (GPT-4o, Gemini 1.5 Pro), and Bayesian agents in a dynamic negotiation setting under identical conditions. Bayesian agents extract the highest surplus through aggressive optimization, at the cost of frequent trade rejections. Humans and LLMs achieve similar overall surplus, but through distinct behaviors: LLMs favor conservative, concessionary trades with few rejections, while humans employ more strategic, risk-taking, and fairness-oriented behaviors. Thus, we find that performance parity -- a common benchmark in agent evaluation -- can conceal fundamental differences in process and alignment, which are critical for practical deployment in real-world coordination tasks. By establishing foundational behavioral baselines under matched conditions, this work provides a baseline for future studies in more applied, variable-rich environments.

Paper Structure

This paper contains 67 sections, 7 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: This provides a broad overview of the bargaining game. Left: Game setup with initial chip counts and valuations. Center: Simplified version of gameplay (offer proposals and acceptances). Right: The types of agents empirically evaluated and simulated in this study.
  • Figure 2: Surplus trajectories for human, LLM, and Bayesian-learning agent simulations across game complexities. Each unique game yields a blue line corresponding to the ratio of surplus achieved relative to the computed optimal allocation. Means and 95% confidence intervals are highlighted and listed in Appendix \ref{['tab:summary_metric_table']}. All populations generated positive aggregate surplus.
  • Figure 3: Trading patterns in the 3-chip game, visualizing (i) net surplus change for the proposer and (ii) trade ratio. Accepted trades are shown in green, rejected trades in red. Marginal distributions across each axis are adjacent to the plots. The vertical dashed line marks zero net surplus (no net value created), and the horizontal solid line marks balanced exchange (1:1 ratio of chips). Figure values are provided in Table \ref{['tab:summary_trade_table']}.
  • Figure 4: Example of player chip value trajectories over nine turns of a trading game (3-Chip). To avoid speculation, we do not analyze analyze Player 1's proposal on Turn 4 (an unaccepted trade), or Player 3's acceptance on Turn 5 (accepting a negative surplus offer).
  • Figure 5: Counts of strategic actions across games. Bayesian agents performed more optimal, no-regret actions relative to other populations as game complexity increased. Declining a proposal would not cause forced regret, as this action does not commit any chips. The total number of actions naturally increases with game complexity, reflecting the larger solution space, not necessarily improved decision-making.
  • ...and 8 more figures