Table of Contents
Fetching ...

Opponent Shaping in LLM Agents

Marta Emili Garcia Segura, Stephen Hailes, Mirco Musolesi

TL;DR

The paper presents ShapeLLM, a model-free opponent shaping framework for transformer-based LLM agents, and demonstrates that LLMs can both influence and be influenced by others through interaction in repeated 2×2 games. By adapting OS methods to the transformer setting and embedding history and context in prompts, ShapeLLM achieves exploitative and cooperative shaping across IPD, IMP, ICG, ISH, and C-IPD, often surpassing baseline independent learners. The work reveals that LLMs can steer opponents toward exploitable equilibria or mutually beneficial outcomes, with robustness to prompt variations and opponent initialization, while also highlighting risks and avenues for future exploration in more complex or realistic multi-agent environments. These findings establish opponent shaping as a fundamental dimension of multi-agent LLM research with implications for both coordinated behavior and potential adversarial exploitation in real-world deployments.

Abstract

Large Language Models (LLMs) are increasingly being deployed as autonomous agents in real-world environments. As these deployments scale, multi-agent interactions become inevitable, making it essential to understand strategic behavior in such systems. A central open question is whether LLM agents, like reinforcement learning agents, can shape the learning dynamics and influence the behavior of others through interaction alone. In this paper, we present the first investigation of opponent shaping (OS) with LLM-based agents. Existing OS algorithms cannot be directly applied to LLMs, as they require higher-order derivatives, face scalability constraints, or depend on architectural components that are absent in transformers. To address this gap, we introduce ShapeLLM, an adaptation of model-free OS methods tailored for transformer-based agents. Using ShapeLLM, we examine whether LLM agents can influence co-players' learning dynamics across diverse game-theoretic environments. We demonstrate that LLM agents can successfully guide opponents toward exploitable equilibria in competitive games (Iterated Prisoner's Dilemma, Matching Pennies, and Chicken) and promote coordination and improve collective welfare in cooperative games (Iterated Stag Hunt and a cooperative version of the Prisoner's Dilemma). Our findings show that LLM agents can both shape and be shaped through interaction, establishing opponent shaping as a key dimension of multi-agent LLM research.

Opponent Shaping in LLM Agents

TL;DR

The paper presents ShapeLLM, a model-free opponent shaping framework for transformer-based LLM agents, and demonstrates that LLMs can both influence and be influenced by others through interaction in repeated 2×2 games. By adapting OS methods to the transformer setting and embedding history and context in prompts, ShapeLLM achieves exploitative and cooperative shaping across IPD, IMP, ICG, ISH, and C-IPD, often surpassing baseline independent learners. The work reveals that LLMs can steer opponents toward exploitable equilibria or mutually beneficial outcomes, with robustness to prompt variations and opponent initialization, while also highlighting risks and avenues for future exploration in more complex or realistic multi-agent environments. These findings establish opponent shaping as a fundamental dimension of multi-agent LLM research with implications for both coordinated behavior and potential adversarial exploitation in real-world deployments.

Abstract

Large Language Models (LLMs) are increasingly being deployed as autonomous agents in real-world environments. As these deployments scale, multi-agent interactions become inevitable, making it essential to understand strategic behavior in such systems. A central open question is whether LLM agents, like reinforcement learning agents, can shape the learning dynamics and influence the behavior of others through interaction alone. In this paper, we present the first investigation of opponent shaping (OS) with LLM-based agents. Existing OS algorithms cannot be directly applied to LLMs, as they require higher-order derivatives, face scalability constraints, or depend on architectural components that are absent in transformers. To address this gap, we introduce ShapeLLM, an adaptation of model-free OS methods tailored for transformer-based agents. Using ShapeLLM, we examine whether LLM agents can influence co-players' learning dynamics across diverse game-theoretic environments. We demonstrate that LLM agents can successfully guide opponents toward exploitable equilibria in competitive games (Iterated Prisoner's Dilemma, Matching Pennies, and Chicken) and promote coordination and improve collective welfare in cooperative games (Iterated Stag Hunt and a cooperative version of the Prisoner's Dilemma). Our findings show that LLM agents can both shape and be shaped through interaction, establishing opponent shaping as a key dimension of multi-agent LLM research.

Paper Structure

This paper contains 30 sections, 2 equations, 15 figures, 15 tables.

Figures (15)

  • Figure 1: Schematic representation of a trial. Each box corresponds to an episode (a game played for $T$ rounds). Same-colored boxes represent episodes within the same parallel environment. Within each environment, episodes occur sequentially as indicated by the arrows. The shaper updates its parameters using the experience collected throughout the entire trial.
  • Figure 2: Average reward per step (top row) and state visitation (bottom row) during training for the shaping experiments across the IPD, IMP, and ICG. In the state visitation figures, the outcome "I" encompasses all transitions where either player chose $a_\text{null}$. Results are reported along with a 95% confidence interval over 5 random seeds.
  • Figure 3: Average reward per step (top row) and state visitation (bottom row) during training for the shaping experiments across the C-IPD and ISH. In the state visitation figures, the outcome "I" encompasses all transitions where either player chose $a_\text{null}$. All results are reported along with a 95% confidence interval over 5 random seeds.
  • Figure 4: Average reward per step (top row) and state visitation (bottom row) during training for the enriched observation baseline experiments across the IPD, IMP, and ICG. For the latter, two opponent configurations are presented: ICG and ICG (alt. opp.). They use $w_{a_1}=\text{S}, w_{a_2}=\text{G}$ and $w_{a_1}=\text{N}, w_{a_2}=\text{M}$ as the opponent's action labels respectively, and $w_{a_1}=\text{S}, w_{a_2}=\text{G}$ for the player with enriched observations. In the state visitation figures, the outcome "I” encompasses all transitions where either player chose $a_\text{null}$. The results are reported along with a 95% confidence interval over 5 random seeds.
  • Figure 5: Table-format prompt variation for the IPD. Instead of a textual description, the payoff matrix is presented in markdown table form, replicating the base model's formatting style.
  • ...and 10 more figures