Table of Contents
Fetching ...

High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination

Sahaj Singh Maini, Robert L. Goldstone, Zoran Tiganj

Abstract

Humans exhibit remarkable abilities to coordinate in groups. As large language models (LLMs) become more capable, it remains an open question whether they can demonstrate comparable adaptive coordination and whether they use the same strategies as humans. To investigate this, we compare LLM and human performance on a common-interest game with imperfect monitoring: Group Binary Search. In this n-player game, participants need to coordinate their actions to achieve a common objective. Players independently submit numerical values in an effort to collectively sum to a randomly assigned target number. Without direct communication, they rely on group feedback to iteratively adjust their submissions until they reach the target number. Our findings show that, unlike humans who adapt and stabilize their behavior over time, LLMs often fail to improve across games and exhibit excessive switching, which impairs group convergence. Moreover, richer feedback (e.g., numerical error magnitude) benefits humans substantially but has small effects on LLMs. Taken together, by grounding the analysis in human baselines and mechanism-level metrics, including reactivity scaling, switching dynamics, and learning across games, we point to differences in human and LLM groups and provide a behaviorally grounded diagnostic for closing the coordination gap.

High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination

Abstract

Humans exhibit remarkable abilities to coordinate in groups. As large language models (LLMs) become more capable, it remains an open question whether they can demonstrate comparable adaptive coordination and whether they use the same strategies as humans. To investigate this, we compare LLM and human performance on a common-interest game with imperfect monitoring: Group Binary Search. In this n-player game, participants need to coordinate their actions to achieve a common objective. Players independently submit numerical values in an effort to collectively sum to a randomly assigned target number. Without direct communication, they rely on group feedback to iteratively adjust their submissions until they reach the target number. Our findings show that, unlike humans who adapt and stabilize their behavior over time, LLMs often fail to improve across games and exhibit excessive switching, which impairs group convergence. Moreover, richer feedback (e.g., numerical error magnitude) benefits humans substantially but has small effects on LLMs. Taken together, by grounding the analysis in human baselines and mechanism-level metrics, including reactivity scaling, switching dynamics, and learning across games, we point to differences in human and LLM groups and provide a behaviorally grounded diagnostic for closing the coordination gap.

Paper Structure

This paper contains 26 sections, 33 figures, 9 tables.

Figures (33)

  • Figure 1: Schematic of a GBS game with three players and numerical feedback. The sum of guesses from each player is compared to the mystery number, and the players are provided feedback about the difference between the sum of their guesses and the mystery number. The players can then adjust their answer (without communicating with each other or knowing the guesses by other players), and the game continues until the sum of guesses matches the mystery number or until 15 rounds have been played.
  • Figure 2: Example of coordination in 3-player games with numerical feedback and zero-shot prompts. The solid horizontal line indicates the mystery number, the other solid line indicates the sum of the group, and the dashed line represents the decisions of each agent in the group.
  • Figure 3: Average number of rounds needed to finish the game with zero-shot CoT prompts under numerical feedback.
  • Figure 4: Group reaction to numerical feedback under zero-shot CoT prompting. Each dot denotes the aggregate adjustment made by a group after the previous round's feedback. The dotted line indicates the optimal collective correction, and the solid line shows the fitted group reaction. Human groups remain closer to stable underreaction, whereas LLM groups more often respond with steeper collective updates.
  • Figure 5: Average proportion of players switching their guess from the previous round as the group approaches the end of the game (either by termination or by finding a solution), across different experimental conditions: small-group vs medium-group vs large-group and zero-shot vs. zero-shot CoT prompts. Error bars represent standard deviation across games.
  • ...and 28 more figures