Planning from Pixels using Inverse Dynamics Models
Keiran Paster, Sheila A. McIlraith, Jimmy Ba
TL;DR
This work tackles the challenge of planning from high-dimensional pixel observations by learning task-conditioned latent world models that predict action sequences to achieve goals. It introduces GLAMOR, which simultaneously learns an inverse dynamics model and an action prior to factor planning in a latent space, enabling efficient, heuristic-guided search via random shooting. The approach demonstrates strong performance and sample efficiency on diverse visual goal tasks in Atari and the DeepMind Control Suite, outperforming prior model-free methods in many settings. The findings highlight the value of a latent, goal-conditioned planning framework for fast adaptation to new tasks with sparse rewards and suggest avenues for extending to general rewards and stochastic environments.
Abstract
Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches.
