Table of Contents
Fetching ...

Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange

TL;DR

Obstacle Tower introduces a high-fidelity, 3D, procedurally generated benchmark to push generalization across vision, control, and planning from pixel inputs. Built on Unity/ML-Agents, it offers up to 100 floors with varied visual themes and two reward configurations, emphasizing evaluation on unseen instances (weak/strong generalization). Preliminary results show current Deep RL baselines (PPO, Rainbow) underperforming humans, with Rainbow exhibiting limited generalization in varied training settings. The work argues for fundamental advances in representation, memory, and planning to tackle this challenging, multi-axis benchmark, and outlines plans for open-source releases and expanded configurations to broaden its utility across robotics and navigation research.

Abstract

The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent's ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level.

Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

TL;DR

Obstacle Tower introduces a high-fidelity, 3D, procedurally generated benchmark to push generalization across vision, control, and planning from pixel inputs. Built on Unity/ML-Agents, it offers up to 100 floors with varied visual themes and two reward configurations, emphasizing evaluation on unseen instances (weak/strong generalization). Preliminary results show current Deep RL baselines (PPO, Rainbow) underperforming humans, with Rainbow exhibiting limited generalization in varied training settings. The work argues for fundamental advances in representation, memory, and planning to tackle this challenging, multi-axis benchmark, and outlines plans for open-source releases and expanded configurations to broaden its utility across robotics and navigation research.

Abstract

The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent's ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level.

Paper Structure

This paper contains 32 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Examples of agent observations in the Obstacle Tower at different floor levels. [Left] Early floor is rendered in the Ancient theme. [Middle] Intermediate floor is rendered using the Moorish theme. [Right] Later floor is rendered in Industrial theme.
  • Figure 2: Examples of floor layouts in the Obstacle Tower at different floor levels. [Left] Early floor is rendered in the Ancient theme. [Middle] Intermediate floor is rendered using the Moorish theme. [Right] Later floor is rendered in Industrial theme.
  • Figure 3: Four examples of Obstacle Tower mission graph rules.
  • Figure 4: Mean episodic reward received during training by agent trained using OpenAI Baseline PPO (PPO) and Dopamine Rainbow (RNB) in the Fixed and Varied training conditions.