Table of Contents
Fetching ...

Contraction Metric Based Safe Reinforcement Learning Force Control for a Hydraulic Actuator with Real-World Training

Lucca Maitan, Lucas Toschi, Cícero Zanette, Elisa G. Vergamini, Leonardo F. Santos, Thiago Boaventura

TL;DR

This work tackles safe reinforcement learning for hydraulic force control by integrating contraction-metric certificates with a data-driven actuator model to enable real-world online training. A learned contraction metric drives a lightweight QP filter that minimally corrects policy outputs to enforce approximate exponential trajectory convergence, while an SAC policy tunes PI gains of a baseline feedback-linearization controller. Hardware experiments show that real-world RL training yields superior force-tracking performance compared to simulation-only training and fixed-gain baselines, with the contraction filter reducing chattering and instabilities during learning. The approach demonstrates the practical viability of contraction-based safety for high-force hydraulic systems, though robustness under extreme operating conditions remains a challenge and suggests paths toward multi-DOF and legged-hydraulic applications.

Abstract

Force control in hydraulic actuators is notoriously difficult due to strong nonlinearities, uncertainties, and the high risks associated with unsafe exploration during learning. This paper investigates safe reinforcement learning (RL) for hy draulic force control with real-world training using contraction metric certificates. A data-driven model of a hydraulic actuator, identified from experimental data, is employed for simulation based pretraining of a Soft Actor-Critic (SAC) policy that adapts the PI gains of a feedback-linearization (FL) controller. To reduce instability during online training, we propose a quadratic-programming (QP) contraction filter that leverages a learned contraction metric to enforce approximate exponential convergence of trajectories, applying minimal corrections to the policy output. The approach is validated on a hydraulic test bench, where the RL controller is trained directly on hardware and benchmarked against a simulation-trained agent and a fixed-gain baseline. Experimental results show that real-hardware training improves force-tracking performance compared to both alternatives, while the contraction filter mitigates chattering and instabilities. These findings suggest that contraction-based certificates can enable safe RL in high force hydraulic systems, though robustness at extreme operating conditions remains a challenge.

Contraction Metric Based Safe Reinforcement Learning Force Control for a Hydraulic Actuator with Real-World Training

TL;DR

This work tackles safe reinforcement learning for hydraulic force control by integrating contraction-metric certificates with a data-driven actuator model to enable real-world online training. A learned contraction metric drives a lightweight QP filter that minimally corrects policy outputs to enforce approximate exponential trajectory convergence, while an SAC policy tunes PI gains of a baseline feedback-linearization controller. Hardware experiments show that real-world RL training yields superior force-tracking performance compared to simulation-only training and fixed-gain baselines, with the contraction filter reducing chattering and instabilities during learning. The approach demonstrates the practical viability of contraction-based safety for high-force hydraulic systems, though robustness under extreme operating conditions remains a challenge and suggests paths toward multi-DOF and legged-hydraulic applications.

Abstract

Force control in hydraulic actuators is notoriously difficult due to strong nonlinearities, uncertainties, and the high risks associated with unsafe exploration during learning. This paper investigates safe reinforcement learning (RL) for hy draulic force control with real-world training using contraction metric certificates. A data-driven model of a hydraulic actuator, identified from experimental data, is employed for simulation based pretraining of a Soft Actor-Critic (SAC) policy that adapts the PI gains of a feedback-linearization (FL) controller. To reduce instability during online training, we propose a quadratic-programming (QP) contraction filter that leverages a learned contraction metric to enforce approximate exponential convergence of trajectories, applying minimal corrections to the policy output. The approach is validated on a hydraulic test bench, where the RL controller is trained directly on hardware and benchmarked against a simulation-trained agent and a fixed-gain baseline. Experimental results show that real-hardware training improves force-tracking performance compared to both alternatives, while the contraction filter mitigates chattering and instabilities. These findings suggest that contraction-based certificates can enable safe RL in high force hydraulic systems, though robustness at extreme operating conditions remains a challenge.
Paper Structure (16 sections, 32 equations, 5 figures, 7 tables)

This paper contains 16 sections, 32 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Experimental setup used for controller testing and environment model learning. 1 - Servo valve. 2 - Cylinder. 3 - Incremental encoder. 4 - Load cell. 5 - Spring with stiffness 20000Nm. 6 - Electronics.
  • Figure 2: Contraction metric loss evolution during training
  • Figure 3: Force tracking with randomized PI gains using QP contraction filtering.
  • Figure 4: Force tracking for the 2hz experiments using Controller I
  • Figure 5: $K_p$ an $K_i$ variation using Controller I for the 2hz experiment