Table of Contents
Fetching ...

Leading the Follower: Learning Persuasive Agents in Social Deduction Games

Zhang Zheng, Deheng Ye, Peilin Zhao, Hao Wang

TL;DR

The paper proposes a Stackelberg-based framework for turn-based dialogue in social deduction games, framing each speaking turn as a leader-follower interaction where the leader optimizes utterances to steer the follower's responses. It develops an RL training pipeline that uses an API-based backend to generate base utterances and an open-source Refiner to maximize persuasive impact, guided by a Measurer that estimates follower response probabilities; GRPO drives the refinement without requiring explicit human preference data. Across Werewolf, Avalon, and ONUW, the approach yields consistent gains over strong baselines and generalizes across different backend LLMs, indicating robust, model-agnostic persuasive capability. The work demonstrates a principled method to imbue AI agents with strategic social influence, with potential applications in any domain requiring persuasive, multi-turn communication under uncertainty.

Abstract

Large language model (LLM) agents have shown remarkable progress in social deduction games (SDGs). However, existing approaches primarily focus on information processing and strategy selection, overlooking the significance of persuasive communication in influencing other players' beliefs and responses. In SDGs, success depends not only on making correct deductions but on convincing others to response in alignment with one's intent. To address this limitation, we formalize turn-based dialogue in SDGs as a Stackelberg competition, where the current player acts as the leader who strategically influences the follower's response. Building on this theoretical foundation, we propose a reinforcement learning framework that trains agents to optimize utterances for persuasive impact. Through comprehensive experiments across three diverse SDGs, we demonstrate that our agents significantly outperform baselines. This work represents a significant step toward developing AI agents capable of strategic social influence, with implications extending to scenarios requiring persuasive communication.

Leading the Follower: Learning Persuasive Agents in Social Deduction Games

TL;DR

The paper proposes a Stackelberg-based framework for turn-based dialogue in social deduction games, framing each speaking turn as a leader-follower interaction where the leader optimizes utterances to steer the follower's responses. It develops an RL training pipeline that uses an API-based backend to generate base utterances and an open-source Refiner to maximize persuasive impact, guided by a Measurer that estimates follower response probabilities; GRPO drives the refinement without requiring explicit human preference data. Across Werewolf, Avalon, and ONUW, the approach yields consistent gains over strong baselines and generalizes across different backend LLMs, indicating robust, model-agnostic persuasive capability. The work demonstrates a principled method to imbue AI agents with strategic social influence, with potential applications in any domain requiring persuasive, multi-turn communication under uncertainty.

Abstract

Large language model (LLM) agents have shown remarkable progress in social deduction games (SDGs). However, existing approaches primarily focus on information processing and strategy selection, overlooking the significance of persuasive communication in influencing other players' beliefs and responses. In SDGs, success depends not only on making correct deductions but on convincing others to response in alignment with one's intent. To address this limitation, we formalize turn-based dialogue in SDGs as a Stackelberg competition, where the current player acts as the leader who strategically influences the follower's response. Building on this theoretical foundation, we propose a reinforcement learning framework that trains agents to optimize utterances for persuasive impact. Through comprehensive experiments across three diverse SDGs, we demonstrate that our agents significantly outperform baselines. This work represents a significant step toward developing AI agents capable of strategic social influence, with implications extending to scenarios requiring persuasive communication.

Paper Structure

This paper contains 56 sections, 13 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Methodological paradigms. Existing methods primarily focus on processing environmental information (such as identifying other players' roles) and selecting strategies based on this information. In contrast, our method measures the next player's response distribution and optimizes utterances specifically for persuasive impact on subsequent player responses.
  • Figure 2: Stackelberg optimization process. First, the leader identifies their strategic intent by analyzing the current situation. Then, the leader measures the follower's response distribution to different leader actions. Finally, the leader optimizes their strategy to maximize their utility given the follower's response distribution.
  • Figure 3: The training framework of our agent. Dark blue arrows indicate the inference pipeline, while light blue arrows represent additional processes during training. In this instance, Player 1 acts as the leader while Player 2 acts as the follower. The backend LLM identifies desired and undesired target responses, then generates a base utterance $u_{\text{base}}$. The Refiner enhances $u_{\text{base}}$ for maximum persuasive impact. The Measurer computes rewards by measuring how different refined utterances $u_t$ affect the probabilities of generating $\hat{u}_{t+1}^+$ and $\hat{u}_{t+1}^-$. Multiple utterances $u_t$ are sampled for group relative advantage calculation during training, while only one is generated during inference. The backend uses an API-based LLM, while the Refiner and Measurer are two copies of the same open-source LLM, with the Measurer's parameters frozen.
  • Figure 4: Generalizability across different backend LLMs. We evaluate our approach on GPT-5 and Qwen3-14B without additional fine-tuning. Each method competes against ReAct under different team assignments, conducting 50 matches per setting.
  • Figure 5: The prompt used for intent identification.
  • ...and 6 more figures