PUMA: Deep Metric Imitation Learning for Stable Motion Primitives

Rodrigo Pérez-Dattari; Cosimo Della Santina; Jens Kober

PUMA: Deep Metric Imitation Learning for Stable Motion Primitives

Rodrigo Pérez-Dattari, Cosimo Della Santina, Jens Kober

TL;DR

PUMA addresses the challenge of guaranteeing stable reaching motions learned via imitation learning across both Euclidean and non-Euclidean state spaces. It introduces a Triplet Stability Loss that leverages deep metric learning to enforce global asymptotic stability to a goal without constraining the DNN architecture, enabling flexible latent representations and geometry-aware optimization. The framework unifies latent-space stability with task-space dynamics through a pair of surrogate stability conditions derived from Lyapunov theory, and demonstrates competitive accuracy and strong stability across classic Euclidean datasets (e.g., LASA, LAIR) as well as non-Euclidean manifolds (e.g., $\mathcal{S}^2$), with validation on real robots in greenhouse and hammer manipulation tasks. These results highlight PUMA’s practical impact for robust, geometry-aware motion primitives that can be learned from demonstrations and deployed on real systems, while pointing to avenues for scalability, primitive composition, and obstacle avoidance.

Abstract

Imitation Learning (IL) is a powerful technique for intuitive robotic programming. However, ensuring the reliability of learned behaviors remains a challenge. In the context of reaching motions, a robot should consistently reach its goal, regardless of its initial conditions. To meet this requirement, IL methods often employ specialized function approximators that guarantee this property by construction. Although effective, these approaches come with a set of limitations: 1) they are unable to fully exploit the capabilities of modern Deep Neural Network (DNN) architectures, 2) some are restricted in the family of motions they can model, resulting in suboptimal IL capabilities, and 3) they require explicit extensions to account for the geometry of motions that consider orientations. To address these challenges, we introduce a novel stability loss function, drawing inspiration from the triplet loss used in the deep metric learning literature. This loss does not constrain the DNN's architecture and enables learning policies that yield accurate results. Furthermore, it is not restricted to a specific state space geometry; therefore, it can easily incorporate the geometry of the robot's state space. We provide a proof of the stability properties induced by this loss and empirically validate our method in various settings. These settings include Euclidean and non-Euclidean state spaces, as well as first-order and second-order motions, both in simulation and with real robots. More details about the experimental results can be found in: https://youtu.be/ZWKLGntCI6w.

PUMA: Deep Metric Imitation Learning for Stable Motion Primitives

TL;DR

), with validation on real robots in greenhouse and hammer manipulation tasks. These results highlight PUMA’s practical impact for robust, geometry-aware motion primitives that can be learned from demonstrations and deployed on real systems, while pointing to avenues for scalability, primitive composition, and obstacle avoidance.

Abstract

Paper Structure (56 sections, 18 theorems, 26 equations, 15 figures, 4 tables)

This paper contains 56 sections, 18 theorems, 26 equations, 15 figures, 4 tables.

Introduction
Related Works
Stability in Euclidean State Spaces
Stability in Non-Euclidean State Spaces
Preliminaries
Dynamical Systems for Reaching Tasks
Problem Formulation
Stability Conditions
Deep Metric Learning: the Triplet Loss
Stability Analysis through Comparison Functions
Methodology
Behavioral Cloning
Triplet Stability Loss
Reformulating the Stability Conditions
Surrogate Stability Conditions
...and 41 more sections

Key Result

Theorem 1

Let $f_{\theta}^{\mathcal{T}}$, $f_{\theta}^{\mathcal{T} \to \mathcal{L}}$ and $f^{\mathcal{L}}$ be the introduced dynamical systems. Then, in the region $\mathcal{T}$, $x_{\mathrm{g}}$ is a globally asymptotically stable equilibrium of $f_{\theta}^{\mathcal{T}}$ if, $\forall x_{t} \in \mathcal{T}$,

Figures (15)

Figure 1: Motion learned using the proposed framework. The blue trajectory in the task space $\mathcal{T}$ demonstrates the evolution of the robot's end effector state $x_{t}$ when represented in a spherical manifold. The evolution of this trajectory is governed by the dynamical system $\dot{x}_{t}=\phi_{\theta}(\psi_{\theta}(x_{t}))$, depicted as a vector field of red arrows in the remaining of the space. Through Deep Metric Learning, this system is stabilized by deriving a simpler representation in the latent space $\mathcal{L}$.
Figure 2: Example of trajectories generated by simulating the systems $f^{\mathcal{T}}_{\theta}$, $f^{\mathcal{T}\to\mathcal{L}}_{\theta}$ and $f^{\mathcal{L}}$ for different time instants. The stability conditions are not met in this case, as $f^{\mathcal{T}\to\mathcal{L}}_{\theta}$ differs from $f^{\mathcal{L}}$.
Figure 3: Illustration of the behavioral cloning loss computation. Starting from an initial condition $x_{0}$, the system $f^{\mathcal{T}}_{\theta}$ evolves to various time instants via $\Phi^{x}_{\theta}$. At each instant, the estimated state is compared with a demonstrated state. The red arrows show the gradient path used to update the DNN's weights using BPTT.
Figure 4: Time evolution of two functions $\delta$, $\delta_{1}$ and $\delta_{2}$, starting with different initial conditions $y_{0}$, but same $d_{0}$. Both satisfy the surrogate stability conditions with ${\Delta t = 2}$. Additionally, the values of $\delta^{\text{max}}$ and $\beta$, computed using these functions, are shown. In this representation, ${\delta = e^{-a\cdot t}\left(\sin^{2}\left(\omega t\right) + d_{0} \cdot \cos^{2}\left(\omega t\right)\right)}$ with $a = 0.75$, $d_{0} = 0.2$, and $\omega = [\pi, \pi/2]$.
Figure 5: Left: Illustration of the effect of optimizing $\ell_{\text{stable}}$. The red arrow depicts $y_{t}$ and $y_{t+\Delta t}$ being modified to fulfill \ref{['eq:triplet_inequality']}. Right: Example of trajectories generated with $f^{\mathcal{T}}_{\theta}$ and $f^{\mathcal{T}\to\mathcal{L}}_{\theta}$post-training. Each $y_{t+\Delta t}$ is closer to $y_{\mathrm{g}}$ than its predecessor $y_{t}$.
...and 10 more figures

Theorems & Definitions (36)

Theorem 1: Stability conditions: v1
Definition 1: class-$\mathcal{K}$ function
Definition 2: class-$\mathcal{L}$ function
Definition 3: class-$\mathcal{KL}$ function
Theorem 2: Global asymptotic stability with class-$\mathcal{KL}$ functions
Theorem 3: Stability conditions: v2
Theorem 4: Surrogate stability conditions
proof
Proposition 1: Existence of class-$\mathcal{KL}$ function
proof
...and 26 more

PUMA: Deep Metric Imitation Learning for Stable Motion Primitives

TL;DR

Abstract

PUMA: Deep Metric Imitation Learning for Stable Motion Primitives

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (36)