Contraction Metric Based Safe Reinforcement Learning Force Control for a Hydraulic Actuator with Real-World Training
Lucca Maitan, Lucas Toschi, Cícero Zanette, Elisa G. Vergamini, Leonardo F. Santos, Thiago Boaventura
TL;DR
This work tackles safe reinforcement learning for hydraulic force control by integrating contraction-metric certificates with a data-driven actuator model to enable real-world online training. A learned contraction metric drives a lightweight QP filter that minimally corrects policy outputs to enforce approximate exponential trajectory convergence, while an SAC policy tunes PI gains of a baseline feedback-linearization controller. Hardware experiments show that real-world RL training yields superior force-tracking performance compared to simulation-only training and fixed-gain baselines, with the contraction filter reducing chattering and instabilities during learning. The approach demonstrates the practical viability of contraction-based safety for high-force hydraulic systems, though robustness under extreme operating conditions remains a challenge and suggests paths toward multi-DOF and legged-hydraulic applications.
Abstract
Force control in hydraulic actuators is notoriously difficult due to strong nonlinearities, uncertainties, and the high risks associated with unsafe exploration during learning. This paper investigates safe reinforcement learning (RL) for hy draulic force control with real-world training using contraction metric certificates. A data-driven model of a hydraulic actuator, identified from experimental data, is employed for simulation based pretraining of a Soft Actor-Critic (SAC) policy that adapts the PI gains of a feedback-linearization (FL) controller. To reduce instability during online training, we propose a quadratic-programming (QP) contraction filter that leverages a learned contraction metric to enforce approximate exponential convergence of trajectories, applying minimal corrections to the policy output. The approach is validated on a hydraulic test bench, where the RL controller is trained directly on hardware and benchmarked against a simulation-trained agent and a fixed-gain baseline. Experimental results show that real-hardware training improves force-tracking performance compared to both alternatives, while the contraction filter mitigates chattering and instabilities. These findings suggest that contraction-based certificates can enable safe RL in high force hydraulic systems, though robustness at extreme operating conditions remains a challenge.
