Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
Anthony GX-Chen, Kenneth Marino, Rob Fergus
TL;DR
The paper proposes an object-centric Ab-MDP to structurally decompose RL problems into a semantic space of items and attributes, enabling efficient learning and long-horizon planning. It introduces MEAD, a discriminative, model-based approach that learns a forward model predicting successful item-attribute changes and uses MCTS for exploration and Dijkstra planning for goal-reaching. Empirical results in 2D crafting and MiniHack show strong sample efficiency, robust zero-shot and few-shot transfer across object types, and interpretable, extensible world graphs. The work also demonstrates learning of low-level policies and an object-centric map, and provides thorough ablations highlighting the advantages of discriminative modelling and count-based exploration. Overall, Ab-MDP with MEAD offers a scalable framework for semantically grounded exploration and planning in complex environments with object-centric structure, with potential for unsupervised discovery and broader hierarchical RL integration.
Abstract
In the face of difficult exploration problems in reinforcement learning, we study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning. We found this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels, and attribute change at a higher level of temporal abstraction to primitive actions. This abstraction simplifies the transition dynamic by making specific future states easier to predict. We make use of this to propose a fully model-based algorithm that learns a discriminative world model, plans to explore efficiently with only a count-based intrinsic reward, and can subsequently plan to reach any discovered (abstract) states. We demonstrate the model's ability to (i) efficiently solve single tasks, (ii) transfer zero-shot and few-shot across item types and environments, and (iii) plan across long horizons. Across a suite of 2D crafting and MiniHack environments, we empirically show our model significantly out-performs state-of-the-art low-level methods (without abstraction), as well as performant model-free and model-based methods using the same abstraction. Finally, we show how to learn low level object-perturbing policies via reinforcement learning, and the object mapping itself by supervised learning.
