Table of Contents
Fetching ...

AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game

Yizhou Chi, Lingjun Mao, Zineng Tang

TL;DR

This work addresses how large language models can operate in goal-directed, incomplete-information settings by introducing AmongAgents, a fully text-based Among Us-inspired environment. It presents a framework where LLM-based agents with memory, planning, and personality prompts engage in a social deduction game with two-phase gameplay (Task and Meeting), multiple action spaces, and environment-driven observations. Through controlled and end-to-end evaluations, the study shows that LLMs can learn game rules and basic strategies but struggle with deception, with outcomes varying by planner presence and personality configurations. The findings highlight the potential and limitations of LLMs in socially complex, strategic domains and provide a foundation for future research into memory, planning, and persona-driven behavior in interactive AI systems.

Abstract

Strategic social deduction games serve as valuable testbeds for evaluating the understanding and inference skills of language models, offering crucial insights into social science, artificial intelligence, and strategic gaming. This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior. The study introduces a text-based game environment, named AmongAgents, that mirrors the dynamics of Among Us. Players act as crew members aboard a spaceship, tasked with identifying impostors who are sabotaging the ship and eliminating the crew. Within this environment, the behavior of simulated language agents is analyzed. The experiments involve diverse game sequences featuring different configurations of Crewmates and Impostor personality archetypes. Our work demonstrates that state-of-the-art large language models (LLMs) can effectively grasp the game rules and make decisions based on the current context. This work aims to promote further exploration of LLMs in goal-oriented games with incomplete information and complex action spaces, as these settings offer valuable opportunities to assess language model performance in socially driven scenarios.

AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game

TL;DR

This work addresses how large language models can operate in goal-directed, incomplete-information settings by introducing AmongAgents, a fully text-based Among Us-inspired environment. It presents a framework where LLM-based agents with memory, planning, and personality prompts engage in a social deduction game with two-phase gameplay (Task and Meeting), multiple action spaces, and environment-driven observations. Through controlled and end-to-end evaluations, the study shows that LLMs can learn game rules and basic strategies but struggle with deception, with outcomes varying by planner presence and personality configurations. The findings highlight the potential and limitations of LLMs in socially complex, strategic domains and provide a foundation for future research into memory, planning, and persona-driven behavior in interactive AI systems.

Abstract

Strategic social deduction games serve as valuable testbeds for evaluating the understanding and inference skills of language models, offering crucial insights into social science, artificial intelligence, and strategic gaming. This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior. The study introduces a text-based game environment, named AmongAgents, that mirrors the dynamics of Among Us. Players act as crew members aboard a spaceship, tasked with identifying impostors who are sabotaging the ship and eliminating the crew. Within this environment, the behavior of simulated language agents is analyzed. The experiments involve diverse game sequences featuring different configurations of Crewmates and Impostor personality archetypes. Our work demonstrates that state-of-the-art large language models (LLMs) can effectively grasp the game rules and make decisions based on the current context. This work aims to promote further exploration of LLMs in goal-oriented games with incomplete information and complex action spaces, as these settings offer valuable opportunities to assess language model performance in socially driven scenarios.
Paper Structure (52 sections, 10 equations, 9 figures, 2 tables)

This paper contains 52 sections, 10 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Examples of Agents' conversations during the meeting phase
  • Figure 2: An example diagram illustrating an Impostor's process of information-handling and action-planning.
  • Figure 3: Illustration of what actions Crewmates and Impostors generally do in the task phase and the meeting phase
  • Figure 4: 1) Crewmate persona and winning result counts. 2) Impostor persona and winning result counts. 3) Crewmate persona and action choice count. 4) Impostor persona and action choice count. The stronger the color the higher count.
  • Figure 5: Comparison of average scores by category and role, illustrating the performance differences between Crewmates and Impostors across various cognitive and strategic dimensions such as Self-Awareness, Memory, Planning, Reasoning, and Reflection.
  • ...and 4 more figures