Table of Contents
Fetching ...

Strategic Communication and Language Bias in Multi-Agent LLM Coordination

Alessio Buscemi, Daniele Proverbio, Alessandro Di Stefano, The Anh Han, German Castignani, Pietro Liò

TL;DR

This work investigates how linguistic framing and explicit inter-agent communication affect coordination in multi-agent LLM systems using the FAIRGAME framework. It extends FAIRGAME to include dialogue between agents and evaluates two LLMs (GPT-4o and Llama 4 Maverick) across English, Arabic, and Vietnamese in one-shot and repeated Prisoner’s Dilemma and Battle of Sexes settings, with cooperative and selfish personalities. Results show that communication can both promote cooperation and disrupt alignment, depending on language, model, and game type, with prisoner's dilemmas generally benefiting from dialogue while coordination in Battle of Sexes remains nuanced; analysis of message length and vocabulary reveals how strategic signaling differs by horizon knowledge and language. The findings underscore communication as a central mechanism shaping AI coordination and bias, informing the design of safer, fairer, and more interpretable multi-agent systems in real-world deployments.

Abstract

Large Language Model (LLM)-based agents are increasingly deployed in multi-agent scenarios where coordination is crucial but not always assured. Research shows that the way strategic scenarios are framed linguistically can affect cooperation. This paper explores whether allowing agents to communicate amplifies these language-driven effects. Leveraging FAIRGAME, we simulate one-shot and repeated games across different languages and models, both with and without communication. Our experiments, conducted with two advanced LLMs-GPT-4o and Llama 4 Maverick-reveal that communication significantly influences agent behavior, though its impact varies by language, personality, and game structure. These findings underscore the dual role of communication in fostering coordination and reinforcing biases.

Strategic Communication and Language Bias in Multi-Agent LLM Coordination

TL;DR

This work investigates how linguistic framing and explicit inter-agent communication affect coordination in multi-agent LLM systems using the FAIRGAME framework. It extends FAIRGAME to include dialogue between agents and evaluates two LLMs (GPT-4o and Llama 4 Maverick) across English, Arabic, and Vietnamese in one-shot and repeated Prisoner’s Dilemma and Battle of Sexes settings, with cooperative and selfish personalities. Results show that communication can both promote cooperation and disrupt alignment, depending on language, model, and game type, with prisoner's dilemmas generally benefiting from dialogue while coordination in Battle of Sexes remains nuanced; analysis of message length and vocabulary reveals how strategic signaling differs by horizon knowledge and language. The findings underscore communication as a central mechanism shaping AI coordination and bias, informing the design of safer, fairer, and more interpretable multi-agent systems in real-world deployments.

Abstract

Large Language Model (LLM)-based agents are increasingly deployed in multi-agent scenarios where coordination is crucial but not always assured. Research shows that the way strategic scenarios are framed linguistically can affect cooperation. This paper explores whether allowing agents to communicate amplifies these language-driven effects. Leveraging FAIRGAME, we simulate one-shot and repeated games across different languages and models, both with and without communication. Our experiments, conducted with two advanced LLMs-GPT-4o and Llama 4 Maverick-reveal that communication significantly influences agent behavior, though its impact varies by language, personality, and game structure. These findings underscore the dual role of communication in fostering coordination and reinforcing biases.

Paper Structure

This paper contains 11 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Impact of communication in the Prisoner's Dilemma game across the considered LLMs. Each bar corresponds to the sum of the penalties obtained by both agents at the end of the game. Each plot corresponds to a combination of language (ar/en/vn) | type of game (repeated/one-shot) | awareness of the number of rounds (known/unknown). 95% confidence intervals are reported for each bar.
  • Figure 2: Strategy evolution over 10 rounds in the repeated Prisoner's Dilemma game, comparing LLMs with and without communication. Strategy values range from $+1$ (pure defection, Option A) to $-1$ (pure cooperation, Option B), averaged across games by personality type and communication setting.
  • Figure 3: Impact of communication in the Battle of Sexes game across the considered LLMs. Each bar corresponds to the sum of the penalties obtained by both agents at the end of the game. Each plot corresponds to a combination of language | type of game (repeated/one-shot) | awareness of the number of rounds (known/unknown). 95% confidence intervals are reported for each bar.
  • Figure 4: Average evolution of coordination in strategy choices across repeated rounds for all experiments of the Battle of Sexes games, for each LLM. Solid or dashed compare when communication is enabled and when it is disabled. Values represent alignment in option selection in each round, to achieve coordination. The value 1 corresponds to a mismatch in strategies (one selects Option A, the other Option B), reflecting coordination failure or defective behavior, while -1 indicates alignment in choices (successful coordination or cooperative behavior).
  • Figure 5: Total message length, defined as number of characters, per round in repeated Prisoner's Dilemma games, across payoff types and LLMs.