Safe and Optimal Variable Impedance Control via Certified Reinforcement Learning
Shreyas Kumar, Ravi Prakash
TL;DR
This work tackles unstable exploration in variable impedance control when using model-free reinforcement learning. It introduces Certified Gaussian-Manifold Sampling (C-GMS), a trajectory-centric framework that restricts policy exploration to a Lyapunov-certified manifold of stable gain schedules, ensuring stability and actuator feasibility by construction. A convergence theorem guarantees uniformly ultimately bounded tracking error under bounded disturbances, and experiments on both simulation and a real 7-DoF robot demonstrate safe, compliant handover trajectories with an actuator-limit governor. The approach offers a practical route to reliable autonomous interaction in dynamic, uncertain environments by marrying model-free learning with formal stability guarantees.
Abstract
Reinforcement learning (RL) offers a powerful approach for robots to learn complex, collaborative skills by combining Dynamic Movement Primitives (DMPs) for motion and Variable Impedance Control (VIC) for compliant interaction. However, this model-free paradigm often risks instability and unsafe exploration due to the time-varying nature of impedance gains. This work introduces Certified Gaussian Manifold Sampling (C-GMS), a novel trajectory-centric RL framework that learns combined DMP and VIC policies while guaranteeing Lyapunov stability and actuator feasibility by construction. Our approach reframes policy exploration as sampling from a mathematically defined manifold of stable gain schedules. This ensures every policy rollout is guaranteed to be stable and physically realizable, thereby eliminating the need for reward penalties or post-hoc validation. Furthermore, we provide a theoretical guarantee that our approach ensures bounded tracking error even in the presence of bounded model errors and deployment-time uncertainties. We demonstrate the effectiveness of C-GMS in simulation and verify its efficacy on a real robot, paving the way for reliable autonomous interaction in complex environments.
