Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction
Suma Bailis, Jane Friedhoff, Feiyang Chen
TL;DR
Werewolf Arena introduces a dynamic, competitive benchmark for evaluating LLMs through the social deduction game Werewolf, employing a bidding-based turn-taking mechanism to probe strategic communication. The framework supports memory-augmented agents and a rules-based GM to orchestrate Night/Day cycles, while comparing Gemini and GPT-family models in intra-family and head-to-head tournaments. Key findings show Gemini 1.5 Pro often edges GPT-4 in villager-like roles and that speaking strategies and verbosity significantly affect perceived deception and success; Seer behavior highlights the tension between information disclosure and personal risk. The work provides an open-source, scalable platform for advancing understanding of multi-agent social reasoning in LLMs and sets the stage for richer, real-time evaluations of strategic communication under deception.
Abstract
This paper introduces Werewolf Arena, a novel framework for evaluating large language models (LLMs) through the lens of the classic social deduction game, Werewolf. In Werewolf Arena, LLMs compete against each other, navigating the game's complex dynamics of deception, deduction, and persuasion. The framework introduces a dynamic turn-taking system based on bidding, mirroring real-world discussions where individuals strategically choose when to speak. We demonstrate the framework's utility through an arena-style tournament featuring Gemini and GPT models. Our results reveal distinct strengths and weaknesses in the models' strategic reasoning and communication. These findings highlight Werewolf Arena's potential as a challenging and scalable LLM benchmark.
