Table of Contents
Fetching ...

Safe and Optimal Variable Impedance Control via Certified Reinforcement Learning

Shreyas Kumar, Ravi Prakash

TL;DR

This work tackles unstable exploration in variable impedance control when using model-free reinforcement learning. It introduces Certified Gaussian-Manifold Sampling (C-GMS), a trajectory-centric framework that restricts policy exploration to a Lyapunov-certified manifold of stable gain schedules, ensuring stability and actuator feasibility by construction. A convergence theorem guarantees uniformly ultimately bounded tracking error under bounded disturbances, and experiments on both simulation and a real 7-DoF robot demonstrate safe, compliant handover trajectories with an actuator-limit governor. The approach offers a practical route to reliable autonomous interaction in dynamic, uncertain environments by marrying model-free learning with formal stability guarantees.

Abstract

Reinforcement learning (RL) offers a powerful approach for robots to learn complex, collaborative skills by combining Dynamic Movement Primitives (DMPs) for motion and Variable Impedance Control (VIC) for compliant interaction. However, this model-free paradigm often risks instability and unsafe exploration due to the time-varying nature of impedance gains. This work introduces Certified Gaussian Manifold Sampling (C-GMS), a novel trajectory-centric RL framework that learns combined DMP and VIC policies while guaranteeing Lyapunov stability and actuator feasibility by construction. Our approach reframes policy exploration as sampling from a mathematically defined manifold of stable gain schedules. This ensures every policy rollout is guaranteed to be stable and physically realizable, thereby eliminating the need for reward penalties or post-hoc validation. Furthermore, we provide a theoretical guarantee that our approach ensures bounded tracking error even in the presence of bounded model errors and deployment-time uncertainties. We demonstrate the effectiveness of C-GMS in simulation and verify its efficacy on a real robot, paving the way for reliable autonomous interaction in complex environments.

Safe and Optimal Variable Impedance Control via Certified Reinforcement Learning

TL;DR

This work tackles unstable exploration in variable impedance control when using model-free reinforcement learning. It introduces Certified Gaussian-Manifold Sampling (C-GMS), a trajectory-centric framework that restricts policy exploration to a Lyapunov-certified manifold of stable gain schedules, ensuring stability and actuator feasibility by construction. A convergence theorem guarantees uniformly ultimately bounded tracking error under bounded disturbances, and experiments on both simulation and a real 7-DoF robot demonstrate safe, compliant handover trajectories with an actuator-limit governor. The approach offers a practical route to reliable autonomous interaction in dynamic, uncertain environments by marrying model-free learning with formal stability guarantees.

Abstract

Reinforcement learning (RL) offers a powerful approach for robots to learn complex, collaborative skills by combining Dynamic Movement Primitives (DMPs) for motion and Variable Impedance Control (VIC) for compliant interaction. However, this model-free paradigm often risks instability and unsafe exploration due to the time-varying nature of impedance gains. This work introduces Certified Gaussian Manifold Sampling (C-GMS), a novel trajectory-centric RL framework that learns combined DMP and VIC policies while guaranteeing Lyapunov stability and actuator feasibility by construction. Our approach reframes policy exploration as sampling from a mathematically defined manifold of stable gain schedules. This ensures every policy rollout is guaranteed to be stable and physically realizable, thereby eliminating the need for reward penalties or post-hoc validation. Furthermore, we provide a theoretical guarantee that our approach ensures bounded tracking error even in the presence of bounded model errors and deployment-time uncertainties. We demonstrate the effectiveness of C-GMS in simulation and verify its efficacy on a real robot, paving the way for reliable autonomous interaction in complex environments.

Paper Structure

This paper contains 19 sections, 35 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of policy execution under certified vs. unconstrained learning. Top: With (proposed) C-GMS, policy sampling is restricted to a certified manifold, ensuring Lyapunov stability and safe execution through a via-point to the goal. Bottom: Without C-GMS, unconstrained sampling may violate the stability conditions, leading to unsafe behaviors, including collisions with the environment/human.
  • Figure 2: Overview of the C-GMS framework. Trajectories are parameterized using DMPs, and time-varying VIC gains are parameterized using slacks. In a standard approach, gains are sampled directly from a Gaussian, which can violate Lyapunov stability conditions and lead to unstable rollouts. In contrast, C-GMS enforces stability by sampling from a certified manifold where the Lyapunov condition KBVIC holds resulting in stable trajectories throughout learning.
  • Figure 3: Experimental setup for the human-robot collaborative task.
  • Figure 4: VIC gain schedules and corresponding end-effector trajectories of the robot initially, after $15$ updates and after $50$ updates. The policy obtained after the $50^\mathrm{th}$ update was executed on hardware (cf. § \ref{['subsec:conv']}). Via-points are marked by circles.
  • Figure 5: Learning curve and eigenvalue evolution for Eq. \ref{['eq:KB']}. Under C-GMS based sampling, the eigenvalues remain negative, ensuring that the impedance profile guarantees stable control. However, when sampling shifts to an unsafe region, the cost function may still converge (since the via-point is reached prior to C-GMS being disabled), but the eigenvalues become positive, potentially leading to severe instability in the end-effector trajectory.