Table of Contents
Fetching ...

MIMIC: Integrating Diverse Personality Traits for Better Game Testing Using Large Language Model

Yifei Chen, Sarra Habchi, Lili Wei

TL;DR

MIMIC tackles the limitation of homogeneous strategies in automated game testing by embedding diverse human-like playstyles into a large language model framework. It combines a personality-driven Hybrid Planner, a memory system, and a reflective summarizer to generate and execute varied task strategies across multiple games, including Minecraft. Empirical results show superior code- and interaction-level coverage compared with baselines, and higher task success and behavioral diversity than a state-of-the-art Minecraft agent on a comprehensive task suite. The work demonstrates that personality-aware automation can yield richer testing trajectories, better edge-case discovery, and broader applicability beyond gaming to automated UI testing and HCI contexts.

Abstract

Modern video games pose significant challenges for traditional automated testing algorithms, yet intensive testing is crucial to ensure game quality. To address these challenges, researchers designed gaming agents using Reinforcement Learning, Imitation Learning, or Large Language Models. However, these agents often neglect the diverse strategies employed by human players due to their different personalities, resulting in repetitive solutions in similar situations. Without mimicking varied gaming strategies, these agents struggle to trigger diverse in-game interactions or uncover edge cases. In this paper, we present MIMIC, a novel framework that integrates diverse personality traits into gaming agents, enabling them to adopt different gaming strategies for similar situations. By mimicking different playstyles, MIMIC can achieve higher test coverage and richer in-game interactions across different games. It also outperforms state-of-the-art agents in Minecraft by achieving a higher task completion rate and providing more diverse solutions. These results highlight MIMIC's significant potential for effective game testing.

MIMIC: Integrating Diverse Personality Traits for Better Game Testing Using Large Language Model

TL;DR

MIMIC tackles the limitation of homogeneous strategies in automated game testing by embedding diverse human-like playstyles into a large language model framework. It combines a personality-driven Hybrid Planner, a memory system, and a reflective summarizer to generate and execute varied task strategies across multiple games, including Minecraft. Empirical results show superior code- and interaction-level coverage compared with baselines, and higher task success and behavioral diversity than a state-of-the-art Minecraft agent on a comprehensive task suite. The work demonstrates that personality-aware automation can yield richer testing trajectories, better edge-case discovery, and broader applicability beyond gaming to automated UI testing and HCI contexts.

Abstract

Modern video games pose significant challenges for traditional automated testing algorithms, yet intensive testing is crucial to ensure game quality. To address these challenges, researchers designed gaming agents using Reinforcement Learning, Imitation Learning, or Large Language Models. However, these agents often neglect the diverse strategies employed by human players due to their different personalities, resulting in repetitive solutions in similar situations. Without mimicking varied gaming strategies, these agents struggle to trigger diverse in-game interactions or uncover edge cases. In this paper, we present MIMIC, a novel framework that integrates diverse personality traits into gaming agents, enabling them to adopt different gaming strategies for similar situations. By mimicking different playstyles, MIMIC can achieve higher test coverage and richer in-game interactions across different games. It also outperforms state-of-the-art agents in Minecraft by achieving a higher task completion rate and providing more diverse solutions. These results highlight MIMIC's significant potential for effective game testing.

Paper Structure

This paper contains 33 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the MIMIC framework, comprising three LLM-based components: LLM Planner, LLM Action Executor, and LLM Action Summarizer, alongside a non-LLM-based component, the Memory System.
  • Figure 2: Screenshots of the Dungeon Adventures (left) and Shattered Pixel Dungeon (right).
  • Figure 3: Code coverage (left) and branch coverage (right) for Dungeon Adventures (DA). The shaded areas represent the range across three runs, while the solid lines indicate the average coverage over time. The human completed only one complete run (no shaded area).
  • Figure 4: Code Coverage (left) and Branch Coverage (right) for Shattered Pixel Dungeon (SPD) Over Time. MIMIC-P, MIMIC-P+S, and Human completed only one complete run (no shaded area).
  • Figure 5: Code Coverage (left) and Branch Coverage (right) for Shattered Pixel Dungeon (SPD) Over Action Iteration.
  • ...and 2 more figures