A Benchmark Environment for Offline Reinforcement Learning in Racing Games
Girolamo Macaluso, Alessandro Sestini, Andrew D. Bagdanov
TL;DR
The paper tackles the high sample cost of reinforcement learning in modern games by introducing OfflineMania, a TrackMania-inspired racing environment built in Unity with a Gymnasium interface and a suite of offline datasets. It benchmarks Online RL, Offline RL, and offline-to-online approaches, revealing that while online PPO achieves strong performance, it requires extensive interactions, and that IQL often outperforms other ORL methods with offline-to-online learning offering robust gains on several datasets. The work demonstrates the promise and limitations of current ORL techniques in gaming contexts and emphasizes distributional shift as a key challenge during online fine-tuning. Overall, OfflineMania provides a practical, data-driven platform to evaluate and accelerate the integration of offline data in game AI development workflows.
Abstract
Offline Reinforcement Learning (ORL) is a promising approach to reduce the high sample complexity of traditional Reinforcement Learning (RL) by eliminating the need for continuous environmental interactions. ORL exploits a dataset of pre-collected transitions and thus expands the range of application of RL to tasks in which the excessive environment queries increase training time and decrease efficiency, such as in modern AAA games. This paper introduces OfflineMania a novel environment for ORL research. It is inspired by the iconic TrackMania series and developed using the Unity 3D game engine. The environment simulates a single-agent racing game in which the objective is to complete the track through optimal navigation. We provide a variety of datasets to assess ORL performance. These datasets, created from policies of varying ability and in different sizes, aim to offer a challenging testbed for algorithm development and evaluation. We further establish a set of baselines for a range of Online RL, ORL, and hybrid Offline to Online RL approaches using our environment.
