Planning from Pixels using Inverse Dynamics Models

Keiran Paster; Sheila A. McIlraith; Jimmy Ba

Planning from Pixels using Inverse Dynamics Models

Keiran Paster, Sheila A. McIlraith, Jimmy Ba

TL;DR

This work tackles the challenge of planning from high-dimensional pixel observations by learning task-conditioned latent world models that predict action sequences to achieve goals. It introduces GLAMOR, which simultaneously learns an inverse dynamics model and an action prior to factor planning in a latent space, enabling efficient, heuristic-guided search via random shooting. The approach demonstrates strong performance and sample efficiency on diverse visual goal tasks in Atari and the DeepMind Control Suite, outperforming prior model-free methods in many settings. The findings highlight the value of a latent, goal-conditioned planning framework for fast adaptation to new tasks with sparse rewards and suggest avenues for extending to general rewards and stochastic environments.

Abstract

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches.

Planning from Pixels using Inverse Dynamics Models

TL;DR

Abstract

Planning from Pixels using Inverse Dynamics Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (2)