PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

Qinglin Zhu; Runcong Zhao; Bin Liang; Jinhua Du; Lin Gui; Yulan He

PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

Qinglin Zhu, Runcong Zhao, Bin Liang, Jinhua Du, Lin Gui, Yulan He

TL;DR

This work tackles the challenge of enabling LLM-based agents to reason and interact effectively in Murder Mystery Games by introducing the WellPlay dataset and the PLAYER* framework. WellPlay provides a rigorous benchmark of 1,482 inferential questions across 12 MMGs to assess objective, reasoning, and relational understanding in multi-agent social settings. PLAYER* combines a sensor-based state representation with information-theoretic questioning and a pruning mechanism to efficiently narrow the suspect space, achieving higher reasoning accuracy, faster interaction, and better human-agent engagement than strong baselines. The results demonstrate the value of integrating structured state representations, entropy-guided questioning, and memory-aware search for complex social tasks, with practical implications for AI agents in narrative-rich, interactive environments.

Abstract

We introduce WellPlay, a reasoning dataset for multi-agent conversational inference in Murder Mystery Games (MMGs). WellPlay comprises 1,482 inferential questions across 12 games, spanning objectives, reasoning, and relationship understanding, and establishes a systematic benchmark for evaluating agent reasoning abilities in complex social settings. Building on this foundation, we present PLAYER*, a novel framework for Large Language Model (LLM)-based agents in MMGs. MMGs pose unique challenges, including undefined state spaces, absent intermediate rewards, and the need for strategic reasoning through natural language. PLAYER* addresses these challenges with a sensor-based state representation and an information-driven strategy that optimises questioning and suspect pruning. Experiments show that PLAYER* outperforms existing methods in reasoning accuracy, efficiency, and agent-human interaction, advancing reasoning agents for complex social scenarios.

PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

TL;DR

Abstract

Paper Structure (33 sections, 8 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 8 equations, 7 figures, 7 tables, 1 algorithm.

The WellPlay Dataset
Dataset Setting
Dataset
PLAYER*
Search via Sensor-based State Matching
Approximation with a pruner
Experiments
Experimental Setup
Evaluation Metrics
Win Rate:
Question Accuracy:
Baselines
Agent-vs-Agent Evaluation
Agent-vs-Human Evaluation
Efficiency and Cost Analysis
...and 18 more sections

Figures (7)

Figure 1: Search and Approximate. PLAYER* generates questions based on character states, selecting agents to question based on past observations of critical information and the likelihood of uncovering more. The goal is to minimise the suspect list.
Figure 2: Comparison of the performance of agents with other multi-agent algorithms designed for multiplayer deduction games. The Personal Perspective (PP) baseline represents the starting point for searching, where agents rely solely on their own knowledge. The Omniscient Perspective (OP) measures performance when agents have full access to all other agents’ scripts, representing the ideal search endpoint.
Figure 3: Comparison of Dialogue Strategies. PLAYER* significantly enhances story progression by eliciting key clues (clothing, movement, direction), guiding the investigation. It demonstrates superior questioning by targeting specific details, leading to richer responses. In contrast, Werewolf provides minimal advancement, ThinkThrice adds vague auditory clues, and O-CoT being overly honest leads to revealing things that should be kept secret. Overall, PLAYER* outperforms others by designing dialogue for better narrative engagement.
Figure 4: Results of Agent-vs-Human Evaluation using Human-Centric metrics. The detailed data can be found in Table \ref{['tab:human-survey']}.
Figure 5: Comparison of time (hours) and costs ($) for calling OpenAI API across multi-agent algorithms in MMG settings.
...and 2 more figures

PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

TL;DR

Abstract

PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

Authors

TL;DR

Abstract

Table of Contents

Figures (7)