OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments
Quentin Delfosse, Jannis Blüml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting
TL;DR
OCAtari provides a practical object-centric reinforcement learning benchmark for Atari 2600 games by introducing RAM-based (REM) and vision-based (VEM) object extractions, plus the Object-centric Dataset for Atari (ODA). It demonstrates high object-detection performance with REM, substantial speed advantages over VEM, and feasibility for training simple object-centric RL agents. The framework enables generating new challenges via RAM manipulation and offers a public, MIT-licensed toolbox for researchers to compare object-centric methods against traditional pixel-based approaches and AtariARI baselines. This work thus advances interpretable, data-efficient OC RL in a mainstream RL domain and invites further development of OC representations and algorithms.
Abstract
Cognitive science and psychology suggest that object-centric representations of complex scenes are a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep reinforcement learning approaches only rely on pixel-based representations that do not capture the compositional properties of natural scenes. For this, we need environments and datasets that allow us to work and evaluate object-centric approaches. In our work, we extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari, that performs resource-efficient extractions of the object-centric states for these games. Our framework allows for object discovery, object representation learning, as well as object-centric RL. We evaluate OCAtari's detection capabilities and resource efficiency. Our source code is available at github.com/k4ntz/OC_Atari.
