Table of Contents
Fetching ...

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

Yuanzhi Liang, Linchao Zhu, Yi Yang

TL;DR

Tachikuma introduces a novel MOE task and a real-time TRPG-inspired dataset to advance AI agents' understanding of complex, multi-character interactions with novel objects. The authors propose a three-step Think Before Speak prompting baseline to guide LLMs in identifying characters, inferring intentions, and selecting appropriate skill checks, aiming to emulate a human GM. Objective metrics (CP/CR/SP/SR and F-scores) and subjective human evaluations show the approach is solvable and improves the realism, factual accuracy, and grounding of GM-style responses, though room for improvement remains. By providing long-context, grounded interaction data across diverse TRPG rules, Tachikuma aims to spur developments in more capable, grounded AI agents for complex natural-language interactions.

Abstract

Recent advancements in natural language and Large Language Models (LLMs) have enabled AI agents to simulate human-like interactions within virtual worlds. However, these interactions still face limitations in complexity and flexibility, particularly in scenarios involving multiple characters and novel objects. Pre-defining all interactable objects in the agent's world model presents challenges, and conveying implicit intentions to multiple characters through complex interactions remains difficult. To address these issues, we propose integrating virtual Game Masters (GMs) into the agent's world model, drawing inspiration from Tabletop Role-Playing Games (TRPGs). GMs play a crucial role in overseeing information, estimating players' intentions, providing environment descriptions, and offering feedback, compensating for current world model deficiencies. To facilitate future explorations for complex interactions, we introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation (MOE) task and a supporting dataset. MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions. Besides, the dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations. Finally, we present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding. We hope that our dataset and task will inspire further research in complex interactions with natural language, fostering the development of more advanced AI agents.

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

TL;DR

Tachikuma introduces a novel MOE task and a real-time TRPG-inspired dataset to advance AI agents' understanding of complex, multi-character interactions with novel objects. The authors propose a three-step Think Before Speak prompting baseline to guide LLMs in identifying characters, inferring intentions, and selecting appropriate skill checks, aiming to emulate a human GM. Objective metrics (CP/CR/SP/SR and F-scores) and subjective human evaluations show the approach is solvable and improves the realism, factual accuracy, and grounding of GM-style responses, though room for improvement remains. By providing long-context, grounded interaction data across diverse TRPG rules, Tachikuma aims to spur developments in more capable, grounded AI agents for complex natural-language interactions.

Abstract

Recent advancements in natural language and Large Language Models (LLMs) have enabled AI agents to simulate human-like interactions within virtual worlds. However, these interactions still face limitations in complexity and flexibility, particularly in scenarios involving multiple characters and novel objects. Pre-defining all interactable objects in the agent's world model presents challenges, and conveying implicit intentions to multiple characters through complex interactions remains difficult. To address these issues, we propose integrating virtual Game Masters (GMs) into the agent's world model, drawing inspiration from Tabletop Role-Playing Games (TRPGs). GMs play a crucial role in overseeing information, estimating players' intentions, providing environment descriptions, and offering feedback, compensating for current world model deficiencies. To facilitate future explorations for complex interactions, we introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation (MOE) task and a supporting dataset. MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions. Besides, the dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations. Finally, we present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding. We hope that our dataset and task will inspire further research in complex interactions with natural language, fostering the development of more advanced AI agents.
Paper Structure (15 sections, 1 equation, 5 figures, 2 tables)

This paper contains 15 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Examples of different tasks and datasets based on game logs of TRPG. Our MOE and MOD focuses on the understanding of long and complex interactions with Long contexts.
  • Figure 2: Example of MOE. In the given context, a scenario unfolds where three players find themselves facing a formidable brown bear in combat. Each character actively participates in the battle, except for Bill, who observes from the safety of a carriage. During the encounter, Zem casts a spell; however, it is important to note that the skill check for this particular spell has already been performed after Turn 4 and was explained by the DM in Turn 10. Consequently, the only character currently requiring a skill check is Maurice. Despite his intention to escape from the bear, the DND rule does not include a specific "escape" skill. In such a predicament, Maurice must utilize his strength to resist the bear's attempt to grapple him. As a result, the DM advises him to perform a strength check in adherence to the DND rule. Furthermore, we also present the predicted results from GPT-3.5 utilizing template prompts. The results demonstrate a lack of effective context comprehension and highlight the challenges in understanding complex interactions among agents.
  • Figure 3: Distribution of character number in MOE labels.
  • Figure 4: Distribution of skill names in MOE labels of the contexts within DND rule. initiative (ini), intelligence (int), perception (per), arcana (arc), insight (ins).
  • Figure 5: Subjective evaluation by volunteers. With MOE labels or predictions from our method, LLMs generate better responses that close to the real-human in all three evaluating factors.