Table of Contents
Fetching ...

Latent Diffusion Planning for Imitation Learning

Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn

TL;DR

Latent Diffusion Planning tackles data efficiency in imitation learning for visuomotor robotics by decoupling planning and action prediction and operating in a learned latent space. It uses a $\beta$-VAE to create latent embeddings and trains two diffusion models—a latent-space planner and an inverse dynamics model—to forecast latent states and extract actions, respectively. This modular design enables leveraging action-free and suboptimal data, achieving strong performance in simulated tasks and a real-robot Lift task, often outperforming state-of-the-art methods that cannot utilize such data. The approach offers scalable, closed-loop planning with dense latent forecasts, enabling robust policies in settings with limited expert demonstrations and abundant heterogeneous data.

Abstract

Recent progress in imitation learning has been enabled by policy architectures that scale to complex visuomotor tasks, multimodal distributions, and large datasets. However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-free demonstrations, and an inverse dynamics model which can leverage suboptimal data, that both operate over a learned latent space. First, we learn a compact latent space through a variational autoencoder, enabling effective forecasting of future states in image-based domains. Then, we train a planner and an inverse dynamics model with diffusion objectives. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches, as they cannot leverage such additional data.

Latent Diffusion Planning for Imitation Learning

TL;DR

Latent Diffusion Planning tackles data efficiency in imitation learning for visuomotor robotics by decoupling planning and action prediction and operating in a learned latent space. It uses a -VAE to create latent embeddings and trains two diffusion models—a latent-space planner and an inverse dynamics model—to forecast latent states and extract actions, respectively. This modular design enables leveraging action-free and suboptimal data, achieving strong performance in simulated tasks and a real-robot Lift task, often outperforming state-of-the-art methods that cannot utilize such data. The approach offers scalable, closed-loop planning with dense latent forecasts, enabling robust policies in settings with limited expert demonstrations and abundant heterogeneous data.

Abstract

Recent progress in imitation learning has been enabled by policy architectures that scale to complex visuomotor tasks, multimodal distributions, and large datasets. However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-free demonstrations, and an inverse dynamics model which can leverage suboptimal data, that both operate over a learned latent space. First, we learn a compact latent space through a variational autoencoder, enabling effective forecasting of future states in image-based domains. Then, we train a planner and an inverse dynamics model with diffusion objectives. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches, as they cannot leverage such additional data.

Paper Structure

This paper contains 23 sections, 3 equations, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: Latent Diffusion Planning. Left: LDP separates the control problem into forecasting future states with a diffusion-based planner, and extracting actions with a diffusion-based inverse dynamics model (IDM). This design enables training on heterogeneous sources of data, including suboptimal data and action-free data. Right: Unlike action imitation methods such as diffusion policy, LDP is based on forecasting a dense temporal sequence of latent states as well as actions. Using powerful diffusion models for both of these objectives enables LDP to have competitive performance to state-of-the-art imitation learning. Further, unlike prior work on forecasting subgoals, LDP predicts a dense temporal sequence of latent states, which enables scalable closed-loop planning.
  • Figure 2: After training the encoder, Latent Diffusion Planning trains two diffusion models. Top: We train a inverse dynamics model (IDM) with a diffusion objective to directly extract the actions that will be used for control from pairs of latent states. Bottom: We train a powerful latent diffusion model to forecast a chunk of future latent states. The planner and the IDM are used together to produce an action chunk, similar to chi2023diffusionpolicy.
  • Figure 3: Visualizations of Generated Plans. LDP produces dense, closed-loop plans. Here, we visualize decoded latents selected from an LDP trajectory for the Lift, Can, Square, ALOHA Cube, and Franka Lift tasks.
  • Figure : Inference with Latent Diffusion Planning
  • Figure :
  • ...and 1 more figures