Table of Contents
Fetching ...

Exploring Large Language Models for Word Games:Who is the Spy?

Chentian Wei, Jiewei Chen, Jinzhu Xu

TL;DR

This work tackles the challenge of deploying large language models to word games without task-specific training. It introduces a training-free framework built around Chain-of-Thought (CoT) scheduling for the game 'Who is the Spy,' comprising components such as Role Iterator, Describe Phase, Judge Phase, Vote, and Spy Disguise. Empirical results show that Judge CoT markedly improves reasoning and decision quality, Describe CoT reduces output hallucinations, and Spy CoT can increase civilian misvotes though it may not significantly boost spy wins. The findings demonstrate the potential of LLMs for situational reasoning and social interaction in structured, multi-agent environments, with publicly available code to support future research.

Abstract

Word games hold significant research value for natural language processing (NLP), game theory, and related fields due to their rule-based and situational nature. This study explores how large language models (LLMs) can be effectively involved in word games and proposes a training-free framework. "Shei Shi Wo Di" or "Who is the Spy" in English, is a classic word game. Using this game as an example, we introduce a Chain-of-Thought (CoT)-based scheduling framework to enable LLMs to achieve excellent performance in tasks such as inferring role words and disguising their identities. We evaluate the framework's performance based on game success rates and the accuracy of the LLM agents' analytical results. Experimental results affirm the framework's effectiveness, demonstrating notable improvements in LLM performance across multiple datasets. This work highlights the potential of LLMs in mastering situational reasoning and social interactions within structured game environments. Our code is publicly available at https://github.com/ct-wei/Who-is-The-Spy.

Exploring Large Language Models for Word Games:Who is the Spy?

TL;DR

This work tackles the challenge of deploying large language models to word games without task-specific training. It introduces a training-free framework built around Chain-of-Thought (CoT) scheduling for the game 'Who is the Spy,' comprising components such as Role Iterator, Describe Phase, Judge Phase, Vote, and Spy Disguise. Empirical results show that Judge CoT markedly improves reasoning and decision quality, Describe CoT reduces output hallucinations, and Spy CoT can increase civilian misvotes though it may not significantly boost spy wins. The findings demonstrate the potential of LLMs for situational reasoning and social interaction in structured, multi-agent environments, with publicly available code to support future research.

Abstract

Word games hold significant research value for natural language processing (NLP), game theory, and related fields due to their rule-based and situational nature. This study explores how large language models (LLMs) can be effectively involved in word games and proposes a training-free framework. "Shei Shi Wo Di" or "Who is the Spy" in English, is a classic word game. Using this game as an example, we introduce a Chain-of-Thought (CoT)-based scheduling framework to enable LLMs to achieve excellent performance in tasks such as inferring role words and disguising their identities. We evaluate the framework's performance based on game success rates and the accuracy of the LLM agents' analytical results. Experimental results affirm the framework's effectiveness, demonstrating notable improvements in LLM performance across multiple datasets. This work highlights the potential of LLMs in mastering situational reasoning and social interactions within structured game environments. Our code is publicly available at https://github.com/ct-wei/Who-is-The-Spy.

Paper Structure

This paper contains 21 sections, 11 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Our Framework
  • Figure 2: The game process of Who is the Spy
  • Figure 3: Overview of Game Flow
  • Figure 4: CoT of Description
  • Figure 5: CoT of Judgement
  • ...and 2 more figures