Table of Contents
Fetching ...

Geometry-aware Policy Imitation

Yiming Li, Nael Darwiche, Amirreza Razmjoo, Sichao Liu, Yilun Du, Auke Ijspeert, Sylvain Calinon

TL;DR

Geometry-Aware Policy Imitation (GPI) reframes imitation learning by treating demonstrations as geometric curves that induce distance fields in the actuated robot subspace. It derives two complementary flows—progression along the demonstrations and attraction toward them—whose superposition yields a controllable, non-parametric vector field for policy synthesis. By decoupling metric learning from policy synthesis, GPI supports modular latent representations and multimodal demonstrations, enabling efficient composition and fast inference even with high-dimensional observations. Across simulation and real-robot experiments, GPI achieves higher performance with substantially lower memory and computation than diffusion-based policies, while maintaining interpretability and robustness to perturbations. These properties position GPI as an efficient, scalable alternative to generative approaches for robotic imitation learning.

Abstract

We propose a Geometry-aware Policy Imitation (GPI) approach that rethinks imitation learning by treating demonstrations as geometric curves rather than collections of state-action samples. From these curves, GPI derives distance fields that give rise to two complementary control primitives: a progression flow that advances along expert trajectories and an attraction flow that corrects deviations. Their combination defines a controllable, non-parametric vector field that directly guides robot behavior. This formulation decouples metric learning from policy synthesis, enabling modular adaptation across low-dimensional robot states and high-dimensional perceptual inputs. GPI naturally supports multimodality by preserving distinct demonstrations as separate models and allows efficient composition of new demonstrations through simple additions to the distance field. We evaluate GPI in simulation and on real robots across diverse tasks. Experiments show that GPI achieves higher success rates than diffusion-based policies while running 20 times faster, requiring less memory, and remaining robust to perturbations. These results establish GPI as an efficient, interpretable, and scalable alternative to generative approaches for robotic imitation learning. Project website: https://yimingli1998.github.io/projects/GPI/

Geometry-aware Policy Imitation

TL;DR

Geometry-Aware Policy Imitation (GPI) reframes imitation learning by treating demonstrations as geometric curves that induce distance fields in the actuated robot subspace. It derives two complementary flows—progression along the demonstrations and attraction toward them—whose superposition yields a controllable, non-parametric vector field for policy synthesis. By decoupling metric learning from policy synthesis, GPI supports modular latent representations and multimodal demonstrations, enabling efficient composition and fast inference even with high-dimensional observations. Across simulation and real-robot experiments, GPI achieves higher performance with substantially lower memory and computation than diffusion-based policies, while maintaining interpretability and robustness to perturbations. These properties position GPI as an efficient, scalable alternative to generative approaches for robotic imitation learning.

Abstract

We propose a Geometry-aware Policy Imitation (GPI) approach that rethinks imitation learning by treating demonstrations as geometric curves rather than collections of state-action samples. From these curves, GPI derives distance fields that give rise to two complementary control primitives: a progression flow that advances along expert trajectories and an attraction flow that corrects deviations. Their combination defines a controllable, non-parametric vector field that directly guides robot behavior. This formulation decouples metric learning from policy synthesis, enabling modular adaptation across low-dimensional robot states and high-dimensional perceptual inputs. GPI naturally supports multimodality by preserving distinct demonstrations as separate models and allows efficient composition of new demonstrations through simple additions to the distance field. We evaluate GPI in simulation and on real robots across diverse tasks. Experiments show that GPI achieves higher success rates than diffusion-based policies while running 20 times faster, requiring less memory, and remaining robust to perturbations. These results establish GPI as an efficient, interpretable, and scalable alternative to generative approaches for robotic imitation learning. Project website: https://yimingli1998.github.io/projects/GPI/

Paper Structure

This paper contains 53 sections, 32 equations, 13 figures, 4 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of Geometry-Aware Policy Imitation (GPI). GPI treats demonstrations as geometric curves that induce distance fields in the full state space. (Top) The state space is projected onto the robot’s actuated subspace, where control is applied. The projected distance field gives rise to two complementary flows: an attraction flow from the negative gradient (red arrow) and a progression flow from trajectory tangents (yellow arrow). Together, they define a dynamical system that reduces the distance to demonstrations and advances along them, thus imitating expert behavior. The resulting action $\bm{u}$ is executed through the system’s dynamics, yielding state evolution $\int f(x,u)\,dt$ in the full state space. Multiple demonstrations can be composed naturally via Boolean operations on distance fields. Despite unknown system dynamics, the resulting trajectory aligns closely with the most similar demonstration as determined by the distance metric. (Bottom) On the PushT benchmark, GPI achieves multimodal imitation with a higher reward, runs $20$--$100\times$ faster than diffusion policies (DDIM with 10 steps), and requires substantially less memory.
  • Figure 2: Typical ways to obtain latent embedding $\bm{z}$ from raw inputs $\bm{x}$. (i) train a task-specific lightweight model to capture task-relevant features; (ii) use a VAE to learn task-agnostic features; or (iii) apply a pretrained model to obtain features without additional training.
  • Figure 3: From demonstrations to policy flows. (a) Demonstrations. (b) Energy from composed distances. (c) Progression-only flow $\bm{u}=\dot{\bm{x}}$ may drift off the demonstrations. (d) Adding attraction $\bm{u}=\lambda_{1}\dot{\bm{x}}-\lambda_{2}\nabla_{\bm{x}} d$ pulls states toward the demonstrations and along them, ensuring convergence.
  • Figure 4: Robustness to action horizons.
  • Figure 5: Robustness of GPI with respect to demonstrations, $K$ (neighbors), and state representations.
  • ...and 8 more figures