A Continual Offline Reinforcement Learning Benchmark for Navigation Tasks
Anthony Kobanda, Odalric-Ambrym Maillard, Rémy Portelas
TL;DR
This work addresses the challenge of continual offline reinforcement learning for navigation in video-game–style environments. It introduces Continual NavBench, a Godot-based benchmark with standardized offline datasets derived from human gameplay, diverse task streams, and evaluation protocols that capture performance, forgetting, and efficiency. A hierarchical imitation learning backbone (HGCBC) and a broad suite of continual baselines—spanning naive, replay-based, regularization, and architectural methods—are evaluated. Results show that architectural and hierarchical approaches can preserve and transfer knowledge across sequential tasks but often incur higher memory and compute costs, highlighting practical trade-offs for production deployment and reproducible research in offline CRL for navigation.
Abstract
Autonomous agents operating in domains such as robotics or video game simulations must adapt to changing tasks without forgetting about the previous ones. This process called Continual Reinforcement Learning poses non-trivial difficulties, from preventing catastrophic forgetting to ensuring the scalability of the approaches considered. Building on recent advances, we introduce a benchmark providing a suite of video-game navigation scenarios, thus filling a gap in the literature and capturing key challenges : catastrophic forgetting, task adaptation, and memory efficiency. We define a set of various tasks and datasets, evaluation protocols, and metrics to assess the performance of algorithms, including state-of-the-art baselines. Our benchmark is designed not only to foster reproducible research and to accelerate progress in continual reinforcement learning for gaming, but also to provide a reproducible framework for production pipelines -- helping practitioners to identify and to apply effective approaches.
