Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning
Xuezhi Niu, Kaige Tan, Lei Feng
TL;DR
This work tackles gait optimization for a tendon-driven soft quadruped (SoftQ) by adopting model-based reinforcement learning (MBRL) with a data-driven surrogate dynamics model to replace the expensive high-fidelity simulator. By constraining the action space to diagonal leg pairs via a parametric trot gait and incorporating a post-training phase, the method achieves far more data-efficient and robust gait learning than prior model-free approaches, with simulation results showing speeds up to $0.36\ \mathrm{m/s}$ and high stability. The authors validate the approach with a real-time control architecture, transitioning from simulation to hardware and achieving a real-world speed of about $0.13$–$0.15\ \mathrm{m/s}$, while discussing sim-to-real gaps and mitigation strategies (e.g., Kalman filtering, binary contact modeling). Overall, the study demonstrates that surrogate-based MBRL with post-training can deliver efficient, adaptable gait policies for soft robots and paves the way for real-world deployment on deformable legged systems.
Abstract
This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the proposed soft quadruped robot has better safety, less weight, and a simpler mechanism for fabrication and control. However, the primary challenge lies in developing sophisticated control algorithms to attain optimal gait control for fast and stable locomotion. The research employs a multi-stage methodology, including state space restriction, data-driven model training, and reinforcement learning algorithm development. Compared to benchmark methods, the proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology. The study concludes by highlighting the practical applicability of these findings in real-world scenarios.
