Table of Contents
Fetching ...

Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

Jianxin Bi, Kelvin Lim, Kaiqi Chen, Yifei Huang, Harold Soh

TL;DR

Problem: data-efficient imitation learning from state observations under limited action labels. Approach: KOAP combines a diffusion-planner for future-state planning with a Deep Koopman Operator to lift dynamics via observables $g_ heta(x)$ into a linear latent space governed by a learned matrix $\mathcal{K}_\theta$ and a latent-action predictor $f_ heta$, then maps to real actions with a linear decoder $d_phi$. Key contributions: regularized latent-action learning via linear forward dynamics, effective action prediction with minimal $\mathcal{D}_a$ supervision, and strong performance on the D3IL benchmark plus a real-robot scooping case. Significance: enables scalable imitation from observation by reducing the need for action labeling while supporting continuous-action policies in robotics.

Abstract

Recent advances in diffusion-based robot policies have demonstrated significant potential in imitating multi-modal behaviors. However, these approaches typically require large quantities of demonstration data paired with corresponding robot action labels, creating a substantial data collection burden. In this work, we propose a plan-then-control framework aimed at improving the action-data efficiency of inverse dynamics controllers by leveraging observational demonstration data. Specifically, we adopt a Deep Koopman Operator framework to model the dynamical system and utilize observation-only trajectories to learn a latent action representation. This latent representation can then be effectively mapped to real high-dimensional continuous actions using a linear action decoder, requiring minimal action-labeled data. Through experiments on simulated robot manipulation tasks and a real robot experiment with multi-modal expert demonstrations, we demonstrate that our approach significantly enhances action-data efficiency and achieves high task success rates with limited action data.

Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

TL;DR

Problem: data-efficient imitation learning from state observations under limited action labels. Approach: KOAP combines a diffusion-planner for future-state planning with a Deep Koopman Operator to lift dynamics via observables into a linear latent space governed by a learned matrix and a latent-action predictor , then maps to real actions with a linear decoder . Key contributions: regularized latent-action learning via linear forward dynamics, effective action prediction with minimal supervision, and strong performance on the D3IL benchmark plus a real-robot scooping case. Significance: enables scalable imitation from observation by reducing the need for action labeling while supporting continuous-action policies in robotics.

Abstract

Recent advances in diffusion-based robot policies have demonstrated significant potential in imitating multi-modal behaviors. However, these approaches typically require large quantities of demonstration data paired with corresponding robot action labels, creating a substantial data collection burden. In this work, we propose a plan-then-control framework aimed at improving the action-data efficiency of inverse dynamics controllers by leveraging observational demonstration data. Specifically, we adopt a Deep Koopman Operator framework to model the dynamical system and utilize observation-only trajectories to learn a latent action representation. This latent representation can then be effectively mapped to real high-dimensional continuous actions using a linear action decoder, requiring minimal action-labeled data. Through experiments on simulated robot manipulation tasks and a real robot experiment with multi-modal expert demonstrations, we demonstrate that our approach significantly enhances action-data efficiency and achieves high task success rates with limited action data.

Paper Structure

This paper contains 13 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: We adopt a plan-then-control scheme: a diffusion-based planner generates future states based on current and past states, with an inverse dynamics controller that generate action sequence to follow target trajectory. We propose KOAP, a method for leveraging action-free trajectories to improve controller learning with limited action data. KOAP exploits Deep Koopman Operators to lift the nonlinear target system into a linear latent space, which regularizes latent action learning. Real actions can be decoded through learning a simple (linear) action decoder by using action data.
  • Figure 2: Original D3IL Tasks. The robot learns from a dataset containing partially-labeled multi-modal expert demonstrations. The goal is to manipulate object(s) to reach target positions or poses, adhering to task-specific rules (e.g., collision-free or color-based sorting).
  • Figure 3: Average the success rates of each method versus the relative amount of action data used.
  • Figure 4: KOAP's performance increases with observation data.
  • Figure 5: Real robot experiment setup: The robot starts by holding a spoon in a random initial position near the bowl. A camera provides a third-person view of the scene. Using pixel observations, the robot attempts to scoop the chocolate into the target container. We use two evaluation metrics: rim, which indicates the robot was able to push the chocolate to the rim of the bowl, and success, which indicates the chocolate was successfully scooped into the target container.
  • ...and 1 more figures