Table of Contents
Fetching ...

Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches

Hachem Madmoun, Salem Lahlou

TL;DR

This study compares two approaches to eliciting cooperation among multi-agent LLMs: direct, minimal communication via a one-word channel and a curriculum-based in-context learning strategy. In a 4-player Stag Hunt, cheap talk dramatically boosts cooperation in heterogeneous model groups, rising from 0% to 48.3%, illustrating a robust coordination mechanism. Conversely, curriculum learning for social dilemmas proves highly design-sensitive and can degrade performance by approximately 27% in an iterated public goods task, with qualitative analysis revealing learned pessimism and heuristic over-fitting as failure modes. The results imply that simple communication protocols may offer more reliable coordination than experience-based curricula in social dilemmas, while curriculum design requires careful alignment of strategic lessons to the target context.

Abstract

Eliciting cooperation in multi-agent LLM systems is critical for AI alignment. We investigate two approaches: direct communication and curriculum learning. In a 4-player Stag Hunt, a one-word "cheap talk" channel increases cooperation from 0% to 48.3%, demonstrating communication as a robust coordination mechanism. In contrast, we find that curriculum learning is highly sensitive to design choices: our pedagogical curriculum through progressively complex games reduced agent payoffs by 27.4% in an Iterated Public Goods Game with Punishment. Qualitative analysis reveals that curricula emphasizing defection-equilibrium games can induce "learned pessimism" in agents. These findings suggest that for coordination problems, simple communication protocols may be more reliable than experience-based training, and that curriculum design for social dilemmas requires careful attention to the strategic lessons embedded in game sequences.

Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches

TL;DR

This study compares two approaches to eliciting cooperation among multi-agent LLMs: direct, minimal communication via a one-word channel and a curriculum-based in-context learning strategy. In a 4-player Stag Hunt, cheap talk dramatically boosts cooperation in heterogeneous model groups, rising from 0% to 48.3%, illustrating a robust coordination mechanism. Conversely, curriculum learning for social dilemmas proves highly design-sensitive and can degrade performance by approximately 27% in an iterated public goods task, with qualitative analysis revealing learned pessimism and heuristic over-fitting as failure modes. The results imply that simple communication protocols may offer more reliable coordination than experience-based curricula in social dilemmas, while curriculum design requires careful alignment of strategic lessons to the target context.

Abstract

Eliciting cooperation in multi-agent LLM systems is critical for AI alignment. We investigate two approaches: direct communication and curriculum learning. In a 4-player Stag Hunt, a one-word "cheap talk" channel increases cooperation from 0% to 48.3%, demonstrating communication as a robust coordination mechanism. In contrast, we find that curriculum learning is highly sensitive to design choices: our pedagogical curriculum through progressively complex games reduced agent payoffs by 27.4% in an Iterated Public Goods Game with Punishment. Qualitative analysis reveals that curricula emphasizing defection-equilibrium games can induce "learned pessimism" in agents. These findings suggest that for coordination problems, simple communication protocols may be more reliable than experience-based training, and that curriculum design for social dilemmas requires careful attention to the strategic lessons embedded in game sequences.

Paper Structure

This paper contains 44 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Final stage performance comparison across curriculum conditions. Bar heights represent the mean, and error bars indicate the 95% confidence interval. The control group outperforms all curriculum conditions across all metrics.
  • Figure 2: Stag Hunt Analysis: Impact of Communication and Model Diversity Effects. The top panels compare cooperation rates (left) and average payoffs (right) across the four experimental conditions. Communication dramatically increases cooperation in the heterogeneous setting and stabilizes payoffs by reducing outcome variance (shown by error bars). The bottom panels break down payoffs by model family, revealing different strategic performances in the Heterogeneous (left) and Coalition (right) settings.
  • Figure 3: IPGG contribution trajectories by curriculum condition. The control group (orange) maintains the highest average contribution, while the full curriculum (green) shows the fastest decline. The shaded areas represent $95\%$ confidence intervals in the left plot. The right panel shows individual trial trajectories, illustrating the high variance in curriculum-trained conditions.
  • Figure 4: Learning progression analysis. Left: The contribution difference between the full curriculum and control groups, showing a persistent and worsening deficit. Middle: Average contribution in the first vs. last round, demonstrating collapse toward defection. Right: Behavioral consistency (standard deviation of contributions) over time.