COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration
Nicholas Watters, Loic Matthey, Matko Bosnjak, Christopher P. Burgess, Alexander Lerchner
TL;DR
COBRA tackles data efficiency and robustness in continuous control by combining unsupervised object-centric representation learning, curiosity-driven exploration, and model-based RL in a two-phase pipeline. It learns object slots and dynamics without rewards during exploration, then freezes these components and uses a reward predictor for 1-step model-based planning on downstream tasks. The approach yields strong data efficiency and robustness to task-irrelevant perturbations in Spriteworld, outperforming model-free baselines and demonstrating amortization of pretraining across tasks. This work suggests that structured, object-centric world models plus intrinsic curiosity can enable scalable, robust transfer to diverse control tasks.
Abstract
Data efficiency and robustness to task-irrelevant perturbations are long-standing challenges for deep reinforcement learning algorithms. Here we introduce a modular approach to addressing these challenges in a continuous control environment, without using hand-crafted or supervised information. Our Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically motivated exploration and unsupervised learning to build object-based models of its environment and action space. Subsequently, it can learn a variety of tasks through model-based search in very few steps and excel on structured hold-out tests of policy robustness.
