Table of Contents
Fetching ...

An Implementation of Werewolf Agent That does not Truly Trust LLMs

Takehiro Sato, Shintaro Ozaki, Daisaku Yokoyama

TL;DR

The paper tackles the challenge of building an autonomous Werewolf agent capable of realistic dialogue under incomplete information. It introduces a hybrid architecture that couples an LLM with a rule-based controller to enable strategic refutation, timely termination of conversations, and consistent personas via prompt-based style transformation. Key contributions include a rule-based filtering mechanism (including Counter-CO and Closing Conversation rules), a talk-analysis pipeline to extract voting and divination signals, and a persona-driven utterance generation module. Qualitative evaluations indicate the hybrid agent appears more human-like and engaging than a vanilla LLM, though there are noticeable trade-offs in grammatical naturalness and consistency across long conversations. The work highlights the potential of combining rule-based reasoning with LLMs for game-theoretic dialogue tasks and outlines future directions such as scaling to more players and incorporating reinforcement learning for decision-making in larger settings.

Abstract

Werewolf is an incomplete information game, which has several challenges when creating a computer agent as a player given the lack of understanding of the situation and individuality of utterance (e.g., computer agents are not capable of characterful utterance or situational lying). We propose a werewolf agent that solves some of those difficulties by combining a Large Language Model (LLM) and a rule-based algorithm. In particular, our agent uses a rule-based algorithm to select an output either from an LLM or a template prepared beforehand based on the results of analyzing conversation history using an LLM. It allows the agent to refute in specific situations, identify when to end the conversation, and behave with persona. This approach mitigated conversational inconsistencies and facilitated logical utterance as a result. We also conducted a qualitative evaluation, which resulted in our agent being perceived as more human-like compared to an unmodified LLM. The agent is freely available for contributing to advance the research in the field of Werewolf game.

An Implementation of Werewolf Agent That does not Truly Trust LLMs

TL;DR

The paper tackles the challenge of building an autonomous Werewolf agent capable of realistic dialogue under incomplete information. It introduces a hybrid architecture that couples an LLM with a rule-based controller to enable strategic refutation, timely termination of conversations, and consistent personas via prompt-based style transformation. Key contributions include a rule-based filtering mechanism (including Counter-CO and Closing Conversation rules), a talk-analysis pipeline to extract voting and divination signals, and a persona-driven utterance generation module. Qualitative evaluations indicate the hybrid agent appears more human-like and engaging than a vanilla LLM, though there are noticeable trade-offs in grammatical naturalness and consistency across long conversations. The work highlights the potential of combining rule-based reasoning with LLMs for game-theoretic dialogue tasks and outlines future directions such as scaling to more players and incorporating reinforcement learning for decision-making in larger settings.

Abstract

Werewolf is an incomplete information game, which has several challenges when creating a computer agent as a player given the lack of understanding of the situation and individuality of utterance (e.g., computer agents are not capable of characterful utterance or situational lying). We propose a werewolf agent that solves some of those difficulties by combining a Large Language Model (LLM) and a rule-based algorithm. In particular, our agent uses a rule-based algorithm to select an output either from an LLM or a template prepared beforehand based on the results of analyzing conversation history using an LLM. It allows the agent to refute in specific situations, identify when to end the conversation, and behave with persona. This approach mitigated conversational inconsistencies and facilitated logical utterance as a result. We also conducted a qualitative evaluation, which resulted in our agent being perceived as more human-like compared to an unmodified LLM. The agent is freely available for contributing to advance the research in the field of Werewolf game.
Paper Structure (24 sections, 5 figures, 9 tables)

This paper contains 24 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: One example of problems with playing Werewolf game using LLMs. Humans can tell a logical lie naturally, but an LLM can only deny it.
  • Figure 2: The list of five-person werewolf roles.
  • Figure 3: System overview. Our system comprises three modules, utterance generation, talk analysis, and rule-based algorithm. We described utterance generation in Section (\ref{['utterance_generation']}), talk analysis in Section (\ref{['talk_analysis']}), rule-based algorithm in Section (\ref{['rule-based']}), and required game status in Appendix (\ref{['state_and_action']}).
  • Figure 4: An example of prompts regarding style transformation. < CAPITAL LETTER> is the variable.
  • Figure 5: An example of a prompt for talk analysis for specifying the target. < CAPITAL LETTER> is the variable.