Table of Contents
Fetching ...

A Continual Offline Reinforcement Learning Benchmark for Navigation Tasks

Anthony Kobanda, Odalric-Ambrym Maillard, Rémy Portelas

TL;DR

This work addresses the challenge of continual offline reinforcement learning for navigation in video-game–style environments. It introduces Continual NavBench, a Godot-based benchmark with standardized offline datasets derived from human gameplay, diverse task streams, and evaluation protocols that capture performance, forgetting, and efficiency. A hierarchical imitation learning backbone (HGCBC) and a broad suite of continual baselines—spanning naive, replay-based, regularization, and architectural methods—are evaluated. Results show that architectural and hierarchical approaches can preserve and transfer knowledge across sequential tasks but often incur higher memory and compute costs, highlighting practical trade-offs for production deployment and reproducible research in offline CRL for navigation.

Abstract

Autonomous agents operating in domains such as robotics or video game simulations must adapt to changing tasks without forgetting about the previous ones. This process called Continual Reinforcement Learning poses non-trivial difficulties, from preventing catastrophic forgetting to ensuring the scalability of the approaches considered. Building on recent advances, we introduce a benchmark providing a suite of video-game navigation scenarios, thus filling a gap in the literature and capturing key challenges : catastrophic forgetting, task adaptation, and memory efficiency. We define a set of various tasks and datasets, evaluation protocols, and metrics to assess the performance of algorithms, including state-of-the-art baselines. Our benchmark is designed not only to foster reproducible research and to accelerate progress in continual reinforcement learning for gaming, but also to provide a reproducible framework for production pipelines -- helping practitioners to identify and to apply effective approaches.

A Continual Offline Reinforcement Learning Benchmark for Navigation Tasks

TL;DR

This work addresses the challenge of continual offline reinforcement learning for navigation in video-game–style environments. It introduces Continual NavBench, a Godot-based benchmark with standardized offline datasets derived from human gameplay, diverse task streams, and evaluation protocols that capture performance, forgetting, and efficiency. A hierarchical imitation learning backbone (HGCBC) and a broad suite of continual baselines—spanning naive, replay-based, regularization, and architectural methods—are evaluated. Results show that architectural and hierarchical approaches can preserve and transfer knowledge across sequential tasks but often incur higher memory and compute costs, highlighting practical trade-offs for production deployment and reproducible research in offline CRL for navigation.

Abstract

Autonomous agents operating in domains such as robotics or video game simulations must adapt to changing tasks without forgetting about the previous ones. This process called Continual Reinforcement Learning poses non-trivial difficulties, from preventing catastrophic forgetting to ensuring the scalability of the approaches considered. Building on recent advances, we introduce a benchmark providing a suite of video-game navigation scenarios, thus filling a gap in the literature and capturing key challenges : catastrophic forgetting, task adaptation, and memory efficiency. We define a set of various tasks and datasets, evaluation protocols, and metrics to assess the performance of algorithms, including state-of-the-art baselines. Our benchmark is designed not only to foster reproducible research and to accelerate progress in continual reinforcement learning for gaming, but also to provide a reproducible framework for production pipelines -- helping practitioners to identify and to apply effective approaches.

Paper Structure

This paper contains 24 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Visualization of an environment and a human playing : (a) Human Player Interaction. A third-person perspective of a human playing within a large maze ; (b) Overview of a Small Map. A top-down view of a small maze, showcasing the simple layout ; (c) Overview of a Large Map. A top-down view of a larger maze, highlighting a complex layout requiring better planning and actions.
  • Figure 2: SimpleTown mazes are relatively simple, with a size of $20\times20$ meters. The starting positions are randomly sampled on one side, and the goal positions are on the other side ; AmazeVille mazes, of $60\times60$ meters, are more challenging. They have a finite set of start and goal positions, with two subsets of maps : some with high blocks, i.e. not jumpable obstacles ; others with low blocks, i.e. jumpable ones. The naming convention of the tasks use a prefix for the maze family (S for SimpleTown, and A for AmazeVille), the subsequent characters encode key layout features where “O” and “X” are respectively open and closed doors, and “H” and “L” denote high and low blocks.
  • Figure 3: Visualization of the generated trajectories
  • Figure 4: Hindsight Experience Replay (HER) Illustration.
  • Figure 5: Backward and Forward Transfer Metrics. Architectural or separate‐policy methods (PNN, HiSPO, SCN, FTN) typically balance old‐task retention and new‐task adaptation, whereas single‐policy or regularized methods (SC1, FRZ, EWC, L2) risk higher forgetting or reduced forward transfer in complex navigation scenarios.
  • ...and 5 more figures