Globally Stable Neural Imitation Policies

Amin Abyaneh; Mariana Sosa Guzmán; Hsiu-Chin Lin

Globally Stable Neural Imitation Policies

Amin Abyaneh, Mariana Sosa Guzmán, Hsiu-Chin Lin

TL;DR

This work tackles safety and reliability gaps in imitation learning by enforcing global stability through a Lyapunov-based framework. It introduces Stable Neural Dynamical Systems (SNDS), where a neural policy and a convex Lyapunov candidate are trained jointly, with a projection that guarantees global asymptotic stability and convergence to a target state. A differentiable SRVF-informed loss guides trajectory-consistent imitation, enabling efficient gradient-based optimization even when rollouts are simulated via forward Euler steps. Empirical results in both 2D handwriting tasks and high-dimensional SE(3) control demonstrate robust stability, competitive imitation accuracy, and successful sim-to-real transfer on a robotic arm, highlighting SNDS as a scalable, stable alternative for complex planning problems. The approach offers practical impact by delivering stable, predictable policies with reduced data requirements and computational overhead relative to prior safe imitation learning methods, while outlining avenues for extending stability guarantees to constrained and obstacle-rich environments.

Abstract

Imitation learning presents an effective approach to alleviate the resource-intensive and time-consuming nature of policy learning from scratch in the solution space. Even though the resulting policy can mimic expert demonstrations reliably, it often lacks predictability in unexplored regions of the state-space, giving rise to significant safety concerns in the face of perturbations. To address these challenges, we introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees. We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem, and jointly train the policy and its corresponding Lyapunov candidate to ensure global stability. We validate our approach by conducting extensive experiments in simulation and successfully deploying the trained policies on a real-world manipulator arm. The experimental results demonstrate that our method overcomes the instability, accuracy, and computational intensity problems associated with previous imitation learning methods, making our method a promising solution for stable policy learning in complex planning scenarios.

Globally Stable Neural Imitation Policies

TL;DR

Abstract

Paper Structure (18 sections, 1 theorem, 13 equations, 9 figures)

This paper contains 18 sections, 1 theorem, 13 equations, 9 figures.

Introduction
Background
Preliminaries
Problem statement
Methodology
Dynamical system policy formulation
Global asymptotic stability guarantees
SRVF training loss
Experiments
Evaluation
Handwriting dataset policies
SE(3) policy training
Discussion
Conclusion
Reproducability
...and 3 more sections

Key Result

Proposition 1

The dynamical system, $\pi_\theta(\mathbf{{x}})$, in eq:projection_formula is globally asymptotically stable with the Lyapunov function, $v(\mathbf{{x}})$, defined in eq:lpf_definition, and any two arbitrary networks, $\hat{\pi}(\mathbf{{x}})$ and $\hat{v}(\mathbf{{x}})$, with bounded real-valued we

Figures (9)

Figure 1: Overview of the proposed stable neural policy learning method. Policy learning (top) optimizes a Lyapunov-stable neural policy over the expert demonstration data. The optimized policy is then deployed (bottom) to plan globally stable trajectories resistant to unpredictable perturbations.
Figure 2: An example of unstable (left) vs. a stable (right) policies optimized on expert's data from the handwriting dataset handwriting_lasa. While both policy rollouts can reproduce the expert motion, an unstable policy cannot recover from perturbation that push the robot to unknown state space regions.
Figure 3: Joint training of globally stable neural policies (left) and the corresponding LPF (right) optimized on expert's data from the handwriting dataset handwriting_lasa. The contours for the LPF (green) illustrate both positive-definiteness and convexity of the trained function.
Figure 4: Comparing the reproduction accuracy (top) and computational cost (bottom) of SNDS against baseline methods using the MSE and DTW metrics introduced in \ref{['eq:mse_metric']} and \ref{['eq:dtw_metric']}, respectively. The accuracy of policy actions and the learning time against other stable methods remain comparably lower for SNDS across most designated trajectories.
Figure 5: Policy rollouts for SNDS and other baselines. Notice the highlighted inaccuracies or instabilities (yellow) for other methods. The acquired policies are optimized using the N-shaped data of the handwriting dataset.
...and 4 more figures

Theorems & Definitions (2)

Proposition 1
Remark 2

Globally Stable Neural Imitation Policies

TL;DR

Abstract

Globally Stable Neural Imitation Policies

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (2)