Table of Contents
Fetching ...

PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

Qinglin Zhu, Runcong Zhao, Bin Liang, Jinhua Du, Lin Gui, Yulan He

TL;DR

This work tackles the challenge of enabling LLM-based agents to reason and interact effectively in Murder Mystery Games by introducing the WellPlay dataset and the PLAYER* framework. WellPlay provides a rigorous benchmark of 1,482 inferential questions across 12 MMGs to assess objective, reasoning, and relational understanding in multi-agent social settings. PLAYER* combines a sensor-based state representation with information-theoretic questioning and a pruning mechanism to efficiently narrow the suspect space, achieving higher reasoning accuracy, faster interaction, and better human-agent engagement than strong baselines. The results demonstrate the value of integrating structured state representations, entropy-guided questioning, and memory-aware search for complex social tasks, with practical implications for AI agents in narrative-rich, interactive environments.

Abstract

We introduce WellPlay, a reasoning dataset for multi-agent conversational inference in Murder Mystery Games (MMGs). WellPlay comprises 1,482 inferential questions across 12 games, spanning objectives, reasoning, and relationship understanding, and establishes a systematic benchmark for evaluating agent reasoning abilities in complex social settings. Building on this foundation, we present PLAYER*, a novel framework for Large Language Model (LLM)-based agents in MMGs. MMGs pose unique challenges, including undefined state spaces, absent intermediate rewards, and the need for strategic reasoning through natural language. PLAYER* addresses these challenges with a sensor-based state representation and an information-driven strategy that optimises questioning and suspect pruning. Experiments show that PLAYER* outperforms existing methods in reasoning accuracy, efficiency, and agent-human interaction, advancing reasoning agents for complex social scenarios.

PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

TL;DR

This work tackles the challenge of enabling LLM-based agents to reason and interact effectively in Murder Mystery Games by introducing the WellPlay dataset and the PLAYER* framework. WellPlay provides a rigorous benchmark of 1,482 inferential questions across 12 MMGs to assess objective, reasoning, and relational understanding in multi-agent social settings. PLAYER* combines a sensor-based state representation with information-theoretic questioning and a pruning mechanism to efficiently narrow the suspect space, achieving higher reasoning accuracy, faster interaction, and better human-agent engagement than strong baselines. The results demonstrate the value of integrating structured state representations, entropy-guided questioning, and memory-aware search for complex social tasks, with practical implications for AI agents in narrative-rich, interactive environments.

Abstract

We introduce WellPlay, a reasoning dataset for multi-agent conversational inference in Murder Mystery Games (MMGs). WellPlay comprises 1,482 inferential questions across 12 games, spanning objectives, reasoning, and relationship understanding, and establishes a systematic benchmark for evaluating agent reasoning abilities in complex social settings. Building on this foundation, we present PLAYER*, a novel framework for Large Language Model (LLM)-based agents in MMGs. MMGs pose unique challenges, including undefined state spaces, absent intermediate rewards, and the need for strategic reasoning through natural language. PLAYER* addresses these challenges with a sensor-based state representation and an information-driven strategy that optimises questioning and suspect pruning. Experiments show that PLAYER* outperforms existing methods in reasoning accuracy, efficiency, and agent-human interaction, advancing reasoning agents for complex social scenarios.
Paper Structure (33 sections, 8 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 8 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Search and Approximate. PLAYER* generates questions based on character states, selecting agents to question based on past observations of critical information and the likelihood of uncovering more. The goal is to minimise the suspect list.
  • Figure 2: Comparison of the performance of agents with other multi-agent algorithms designed for multiplayer deduction games. The Personal Perspective (PP) baseline represents the starting point for searching, where agents rely solely on their own knowledge. The Omniscient Perspective (OP) measures performance when agents have full access to all other agents’ scripts, representing the ideal search endpoint.
  • Figure 3: Comparison of Dialogue Strategies. PLAYER* significantly enhances story progression by eliciting key clues (clothing, movement, direction), guiding the investigation. It demonstrates superior questioning by targeting specific details, leading to richer responses. In contrast, Werewolf provides minimal advancement, ThinkThrice adds vague auditory clues, and O-CoT being overly honest leads to revealing things that should be kept secret. Overall, PLAYER* outperforms others by designing dialogue for better narrative engagement.
  • Figure 4: Results of Agent-vs-Human Evaluation using Human-Centric metrics. The detailed data can be found in Table \ref{['tab:human-survey']}.
  • Figure 5: Comparison of time (hours) and costs ($) for calling OpenAI API across multi-agent algorithms in MMG settings.
  • ...and 2 more figures