Robust Imitation Learning for Automated Game Testing

Pierluigi Vito Amadori; Timothy Bradley; Ryan Spick; Guy Moss

Robust Imitation Learning for Automated Game Testing

Pierluigi Vito Amadori, Timothy Bradley, Ryan Spick, Guy Moss

TL;DR

EVOLUTE addresses the high cost of automated game testing by proposing a two-stream imitation learning framework that splits actions into discrete and continuous components: FF-BC for discrete actions and EnergyControlled-BC (an energy-based BC) for continuous actions. The model trains the discrete stream with standard BCE-based behavioural cloning and the continuous stream with a learned energy function $E_{\theta}(\mathbf{s},\mathbf{a}_c)$ trained via Knowledge-Contrastive (InfoNCE) objectives and No-Grad/Grid-Search inference to identify low-energy actions. Empirical results in a shooting–driving game (Hardware Rivals) show EVOLUTE offers superior generalisation and exploration, achieving more kills and longer survival than a pure BC baseline, and remaining effective with limited data and even without depth information. The approach demonstrates robust automated game testing capabilities, reducing reliance on exhaustive human playtesting and offering practical benefits for quality assurance in game development.

Abstract

Game development is a long process that involves many stages before a product is ready for the market. Human play testing is among the most time consuming, as testers are required to repeatedly perform tasks in the search for errors in the code. Therefore, automated testing is seen as a key technology for the gaming industry, as it would dramatically improve development costs and efficiency. Toward this end, we propose EVOLUTE, a novel imitation learning-based architecture that combines behavioural cloning (BC) with energy based models (EBMs). EVOLUTE is a two-stream ensemble model that splits the action space of autonomous agents into continuous and discrete tasks. The EBM stream handles the continuous tasks, to have a more refined and adaptive control, while the BC stream handles discrete actions, to ease training. We evaluate the performance of EVOLUTE in a shooting-and-driving game, where the agent is required to navigate and continuously identify targets to attack. The proposed model has higher generalisation capabilities than standard BC approaches, showing a wider range of behaviours and higher performances. Also, EVOLUTE is easier to train than a pure end-to-end EBM model, as discrete tasks can be quite sparse in the dataset and cause model training to explore a much wider set of possible actions while training.

Robust Imitation Learning for Automated Game Testing

TL;DR

trained via Knowledge-Contrastive (InfoNCE) objectives and No-Grad/Grid-Search inference to identify low-energy actions. Empirical results in a shooting–driving game (Hardware Rivals) show EVOLUTE offers superior generalisation and exploration, achieving more kills and longer survival than a pure BC baseline, and remaining effective with limited data and even without depth information. The approach demonstrates robust automated game testing capabilities, reducing reliance on exhaustive human playtesting and offering practical benefits for quality assurance in game development.

Abstract

Paper Structure (16 sections, 11 equations, 6 figures, 1 table, 3 algorithms)

This paper contains 16 sections, 11 equations, 6 figures, 1 table, 3 algorithms.

Introduction
Related Works
Proposed Architecture
Problem Formulation
FeedForward-BC (FF-BC)
EnergyControlled-BC (EC-BC)
Energy Controlled BC (EC-BC)
Training
Inference
Grid-Search inference
No-Grad inference
Results
Data Information
Playing Performance
Exploration Performance
...and 1 more sections

Figures (6)

Figure 1: Diagram of EVOLUTE. Standard FF-based BC (brown) handles discrete actions, while EBM-based BC (green) controls continuous actions. Once the estimates of the two set of actions are computed, they are combined (blue) and sent to the gaming environment.
Figure 2: Diagram of EVOLUTE.
Figure 3: The figures show the (a) interface of the game and the (b)-(c) visual state information collected.
Figure 4: Time alive. The time alive has been normalised between $[0, 1]$, where $1$ indicates the agent successfully drove within the environment without any critical crash for the duration of a $2$min match. EVOLUTE outperforms standard FF-BC, even when depth is not provided in input.
Figure 5: Kill count. The PKR is computed over $20$ matches, with EVOLUTE consistently outperforming standard FF-BC, even when depth is not provided in input.
...and 1 more figures

Robust Imitation Learning for Automated Game Testing

TL;DR

Abstract

Robust Imitation Learning for Automated Game Testing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)