Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

Anthony GX-Chen; Kenneth Marino; Rob Fergus

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

Anthony GX-Chen, Kenneth Marino, Rob Fergus

TL;DR

The paper proposes an object-centric Ab-MDP to structurally decompose RL problems into a semantic space of items and attributes, enabling efficient learning and long-horizon planning. It introduces MEAD, a discriminative, model-based approach that learns a forward model predicting successful item-attribute changes and uses MCTS for exploration and Dijkstra planning for goal-reaching. Empirical results in 2D crafting and MiniHack show strong sample efficiency, robust zero-shot and few-shot transfer across object types, and interpretable, extensible world graphs. The work also demonstrates learning of low-level policies and an object-centric map, and provides thorough ablations highlighting the advantages of discriminative modelling and count-based exploration. Overall, Ab-MDP with MEAD offers a scalable framework for semantically grounded exploration and planning in complex environments with object-centric structure, with potential for unsupervised discovery and broader hierarchical RL integration.

Abstract

In the face of difficult exploration problems in reinforcement learning, we study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning. We found this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels, and attribute change at a higher level of temporal abstraction to primitive actions. This abstraction simplifies the transition dynamic by making specific future states easier to predict. We make use of this to propose a fully model-based algorithm that learns a discriminative world model, plans to explore efficiently with only a count-based intrinsic reward, and can subsequently plan to reach any discovered (abstract) states. We demonstrate the model's ability to (i) efficiently solve single tasks, (ii) transfer zero-shot and few-shot across item types and environments, and (iii) plan across long horizons. Across a suite of 2D crafting and MiniHack environments, we empirically show our model significantly out-performs state-of-the-art low-level methods (without abstraction), as well as performant model-free and model-based methods using the same abstraction. Finally, we show how to learn low level object-perturbing policies via reinforcement learning, and the object mapping itself by supervised learning.

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

TL;DR

Abstract

Paper Structure (77 sections, 13 equations, 36 figures, 10 tables, 1 algorithm)

This paper contains 77 sections, 13 equations, 36 figures, 10 tables, 1 algorithm.

Introduction
Problem Setting
Abstract states and behaviours
Abstracted Item-Attribute MDP
Methods
Forward model
Discriminative modelling
Forward Prediction
Model Learning
Planning for exploration
Planning for goal
Results
Learning from scratch in single environments
Transfer and Compositions
Sandbox transfer
...and 62 more sections

Figures (36)

Figure 1: An example state transition in an Ab-MDP (here defined in MiniHack). Abstract states are sets of (item identity, item attribute), and behaviour $b_1$ have structure (item, new attribute). An abstract state can correspond to multiple low level states, and an abstract behaviour multiple primitive actions. We provide legends for the item identities and attributes illustrated (left rectangles).
Figure 2: The forward model $f_\theta$ predicts the probability $p$ that the behaviour $b_t$ is successful from state $X_t$. The next state distribution is modelled as a binary distribution (Equation \ref{['eq:next-X-from-Xb']}).
Figure 3: Planning within model's imagination to both explore, and to reach any goal state.
Figure 4: Results in games trained from scratch in Ab-MDP. Triangles and stars denote low-level only methods. For MiniHack, we also show the final performance of state-of-the-art exploration methods in the low level MDP (star/triangle), from henaff2022e3b. X-axis on log scale.
Figure 5: Transfer experiments and a difficult compositional environment. Dotted lines in \ref{['fig:sandbox_pretrain_finetune']} and \ref{['fig:freeze_pretrain_finetune']} are the same curves from Figure \ref{['fig:minihack_scratch_5_games']}, with just the means shown for clarity.
...and 31 more figures

Theorems & Definitions (3)

Definition B.1
Definition B.2
Remark B.3

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

TL;DR

Abstract

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

Authors

TL;DR

Abstract

Table of Contents

Figures (36)

Theorems & Definitions (3)