Playing the Werewolf game with artificial intelligence for language understanding
Hisaichi Shibata, Soichiro Miki, Yuta Nakamura
TL;DR
This work tackles enabling AI to understand natural language and participate in the Werewolf social deduction game. It trains a transformer-based value network on 3,840 per-view game logs to estimate the probability of winning given context and a candidate action, and builds Deep Wolf to select actions that maximize this probability. Evaluations against human players show Deep Wolf is competitive for villager and betrayer roles but underperforms as werewolf or seer, highlighting both the potential and current limits of language models in deception-rich dialogue. The study demonstrates that modern language models can reason about statements, detect lies, and generate plausible dialogue within constrainedWerewolf play, with implications for natural language understanding under deception.
Abstract
The Werewolf game is a social deduction game based on free natural language communication, in which players try to deceive others in order to survive. An important feature of this game is that a large portion of the conversations are false information, and the behavior of artificial intelligence (AI) in such a situation has not been widely investigated. The purpose of this study is to develop an AI agent that can play Werewolf through natural language conversations. First, we collected game logs from 15 human players. Next, we fine-tuned a Transformer-based pretrained language model to construct a value network that can predict a posterior probability of winning a game at any given phase of the game and given a candidate for the next action. We then developed an AI agent that can interact with humans and choose the best voting target on the basis of its probability from the value network. Lastly, we evaluated the performance of the agent by having it actually play the game with human players. We found that our AI agent, Deep Wolf, could play Werewolf as competitively as average human players in a villager or a betrayer role, whereas Deep Wolf was inferior to human players in a werewolf or a seer role. These results suggest that current language models have the capability to suspect what others are saying, tell a lie, or detect lies in conversations.
