Globally Stable Neural Imitation Policies
Amin Abyaneh, Mariana Sosa Guzmán, Hsiu-Chin Lin
TL;DR
This work tackles safety and reliability gaps in imitation learning by enforcing global stability through a Lyapunov-based framework. It introduces Stable Neural Dynamical Systems (SNDS), where a neural policy and a convex Lyapunov candidate are trained jointly, with a projection that guarantees global asymptotic stability and convergence to a target state. A differentiable SRVF-informed loss guides trajectory-consistent imitation, enabling efficient gradient-based optimization even when rollouts are simulated via forward Euler steps. Empirical results in both 2D handwriting tasks and high-dimensional SE(3) control demonstrate robust stability, competitive imitation accuracy, and successful sim-to-real transfer on a robotic arm, highlighting SNDS as a scalable, stable alternative for complex planning problems. The approach offers practical impact by delivering stable, predictable policies with reduced data requirements and computational overhead relative to prior safe imitation learning methods, while outlining avenues for extending stability guarantees to constrained and obstacle-rich environments.
Abstract
Imitation learning presents an effective approach to alleviate the resource-intensive and time-consuming nature of policy learning from scratch in the solution space. Even though the resulting policy can mimic expert demonstrations reliably, it often lacks predictability in unexplored regions of the state-space, giving rise to significant safety concerns in the face of perturbations. To address these challenges, we introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees. We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem, and jointly train the policy and its corresponding Lyapunov candidate to ensure global stability. We validate our approach by conducting extensive experiments in simulation and successfully deploying the trained policies on a real-world manipulator arm. The experimental results demonstrate that our method overcomes the instability, accuracy, and computational intensity problems associated with previous imitation learning methods, making our method a promising solution for stable policy learning in complex planning scenarios.
