Table of Contents
Fetching ...

OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments

Quentin Delfosse, Jannis Blüml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting

TL;DR

OCAtari provides a practical object-centric reinforcement learning benchmark for Atari 2600 games by introducing RAM-based (REM) and vision-based (VEM) object extractions, plus the Object-centric Dataset for Atari (ODA). It demonstrates high object-detection performance with REM, substantial speed advantages over VEM, and feasibility for training simple object-centric RL agents. The framework enables generating new challenges via RAM manipulation and offers a public, MIT-licensed toolbox for researchers to compare object-centric methods against traditional pixel-based approaches and AtariARI baselines. This work thus advances interpretable, data-efficient OC RL in a mainstream RL domain and invites further development of OC representations and algorithms.

Abstract

Cognitive science and psychology suggest that object-centric representations of complex scenes are a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep reinforcement learning approaches only rely on pixel-based representations that do not capture the compositional properties of natural scenes. For this, we need environments and datasets that allow us to work and evaluate object-centric approaches. In our work, we extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari, that performs resource-efficient extractions of the object-centric states for these games. Our framework allows for object discovery, object representation learning, as well as object-centric RL. We evaluate OCAtari's detection capabilities and resource efficiency. Our source code is available at github.com/k4ntz/OC_Atari.

OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments

TL;DR

OCAtari provides a practical object-centric reinforcement learning benchmark for Atari 2600 games by introducing RAM-based (REM) and vision-based (VEM) object extractions, plus the Object-centric Dataset for Atari (ODA). It demonstrates high object-detection performance with REM, substantial speed advantages over VEM, and feasibility for training simple object-centric RL agents. The framework enables generating new challenges via RAM manipulation and offers a public, MIT-licensed toolbox for researchers to compare object-centric methods against traditional pixel-based approaches and AtariARI baselines. This work thus advances interpretable, data-efficient OC RL in a mainstream RL domain and invites further development of OC representations and algorithms.

Abstract

Cognitive science and psychology suggest that object-centric representations of complex scenes are a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep reinforcement learning approaches only rely on pixel-based representations that do not capture the compositional properties of natural scenes. For this, we need environments and datasets that allow us to work and evaluate object-centric approaches. In our work, we extend the Atari Learning Environments, the most-used evaluation framework for deep RL approaches, by introducing OCAtari, that performs resource-efficient extractions of the object-centric states for these games. Our framework allows for object discovery, object representation learning, as well as object-centric RL. We evaluate OCAtari's detection capabilities and resource efficiency. Our source code is available at github.com/k4ntz/OC_Atari.
Paper Structure (60 sections, 7 figures, 47 tables)

This paper contains 60 sections, 7 figures, 47 tables.

Figures (7)

  • Figure 1: RL research needs Object-Centric Atari environments. The Atari Learning Environments (ALE) is, by far, the most used RL benchmark among the ones listed on paperswithcode.com (left). Publications using ALE are increasing, together with the number of papers concerned on object-centric RL. As no Object-centric ALE is available yet, the amount of papers concerned with object-centric approaches in Atari is however negligible. Data queried using dimensions.ai, based on keyword occurrence in title and abstract (center) or in full text (left and right).
  • Figure 2: OCAtari can extract object-centric descriptions using two methods: the RAM Extraction method (REM) and the Vision Extraction method (VEM).
  • Figure 3: Qualitative evaluation of OCAtari's REM. Frames from our OCAtari framework on $5$ environments (Pong, Skiing, SpaceInvaders, MsPacman, FishingDerby). Bounding boxes surround the detected objects. REM automatically detects blinking (MsPacman), occluded (FishingDerby) objects, and ignore e.g. exploded objects (SpaceInvaders) that vision methods falsely can pick up.
  • Figure 4: OCAtari (REM) permits learning of object-centric (OC) RL agents. The OC PPO agents perform similar to the pixel-based PPO (Deep) agents' and humans on $5$ Atari games.
  • Figure 5: OCAtari: The object-centric Atari benchmark. OCAtari maintains a list of existing objects via processing the information from the RAM. Our framework enables training and evaluating object discovery methods and object-centric RL algorithms on the widely used Atari Learning Environments benchmark.
  • ...and 2 more figures