DVM: Towards Controllable LLM Agents in Social Deduction Games
Zheng Zhang, Yihuai Lan, Yangsen Chen, Lei Wang, Xiang Wang, Hao Wang
TL;DR
The paper addresses the need for controllable proficiency in LLM agents operating in social deduction games (SDGs). It introduces DVM, a three-component framework (Predictor, Decider, Discussor) trained with supervised learning and reinforcement learning, using a win-rate constrained reward and a decision-chain reward to modulate performance, e.g., $r_t = sr_t + cr$ with $cr(DC) = \alpha (WR - 0.5)$. The Predictor informs the Decider about player relations via $P_t = \text{Predictor}(D_t, V_t)$, while the Decider outputs actions through $\text{Logits}(a) = \text{Decider}(G_t, P_t, WR_{cons.})$ and $\text{Prob}(a) = \text{Softmax}(\text{Logits}(a) - a_{mask} \times 10^9)$, and the Discussor furnishes contextually relevant dialogue. Training proceeds in two steps (FanLang-9 supervised fine-tuning followed by RL with PPO for the Decider and DPO for the Predictor), with a combined reward framework and a tunable control mechanism to keep actual win rates near targeted levels. Experiments in Werewolf show DVM outperforms existing methods, achieves predefined win-rate targets, and benefits from the proposed decision-chain reward, indicating the viability of adaptive, fair, and balanced SDG agents for practical applications.
Abstract
Large Language Models (LLMs) have advanced the capability of game agents in social deduction games (SDGs). These games rely heavily on conversation-driven interactions and require agents to infer, make decisions, and express based on such information. While this progress leads to more sophisticated and strategic non-player characters (NPCs) in SDGs, there exists a need to control the proficiency of these agents. This control not only ensures that NPCs can adapt to varying difficulty levels during gameplay, but also provides insights into the safety and fairness of LLM agents. In this paper, we present DVM, a novel framework for developing controllable LLM agents for SDGs, and demonstrate its implementation on one of the most popular SDGs, Werewolf. DVM comprises three main components: Predictor, Decider, and Discussor. By integrating reinforcement learning with a win rate-constrained decision chain reward mechanism, we enable agents to dynamically adjust their gameplay proficiency to achieve specified win rates. Experiments show that DVM not only outperforms existing methods in the Werewolf game, but also successfully modulates its performance levels to meet predefined win rate targets. These results pave the way for LLM agents' adaptive and balanced gameplay in SDGs, opening new avenues for research in controllable game agents.
