Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

Xuezhi Niu; Kaige Tan; Lei Feng

Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

Xuezhi Niu, Kaige Tan, Lei Feng

TL;DR

This work tackles gait optimization for a tendon-driven soft quadruped (SoftQ) by adopting model-based reinforcement learning (MBRL) with a data-driven surrogate dynamics model to replace the expensive high-fidelity simulator. By constraining the action space to diagonal leg pairs via a parametric trot gait and incorporating a post-training phase, the method achieves far more data-efficient and robust gait learning than prior model-free approaches, with simulation results showing speeds up to $0.36\ \mathrm{m/s}$ and high stability. The authors validate the approach with a real-time control architecture, transitioning from simulation to hardware and achieving a real-world speed of about $0.13$–$0.15\ \mathrm{m/s}$, while discussing sim-to-real gaps and mitigation strategies (e.g., Kalman filtering, binary contact modeling). Overall, the study demonstrates that surrogate-based MBRL with post-training can deliver efficient, adaptable gait policies for soft robots and paves the way for real-world deployment on deformable legged systems.

Abstract

This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the proposed soft quadruped robot has better safety, less weight, and a simpler mechanism for fabrication and control. However, the primary challenge lies in developing sophisticated control algorithms to attain optimal gait control for fast and stable locomotion. The research employs a multi-stage methodology, including state space restriction, data-driven model training, and reinforcement learning algorithm development. Compared to benchmark methods, the proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology. The study concludes by highlighting the practical applicability of these findings in real-world scenarios.

Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

TL;DR

and high stability. The authors validate the approach with a real-time control architecture, transitioning from simulation to hardware and achieving a real-world speed of about

–

, while discussing sim-to-real gaps and mitigation strategies (e.g., Kalman filtering, binary contact modeling). Overall, the study demonstrates that surrogate-based MBRL with post-training can deliver efficient, adaptable gait policies for soft robots and paves the way for real-world deployment on deformable legged systems.

Abstract

Paper Structure (18 sections, 12 equations, 10 figures, 3 tables)

This paper contains 18 sections, 12 equations, 10 figures, 3 tables.

Introduction
Preliminaries
Soft Actor-Critic
Training Framework
Robot Specifications
Surrogate Model Development
Inverse Kinematics Analysis
Restriction on State Space
Surrogate Model Training
Model Based Reinforcement Learning
Agent Specifications and Reward
Post-training
Results and Validation
Training Performance
Benchmark Comparison
...and 3 more sections

Figures (10)

Figure 1: Gait control policy generation framework.
Figure 2: Overview of SoftQ and CTSA: (a) Rendered robot with key states. (b) CTSA bending angle $\alpha_b$. (c) CTSA rotational angle $\alpha_r$. (d) CTSA compression length $z_\textrm{l}$.
Figure 3: Expert gait design, solid lines for FL and RR pairs, dashed lines for FR and RL pairs.
Figure 4: Evaluation of the surrogate model accuracy with varying training data sizes of DNN: (a) R and (b) NRMSE as a function of dataset size. Prediction performance of two architectures at the selected dataset size (250) in terms of (c) $\text{R}^{T}$ and (d) $\text{NRMSE}^{T}$.
Figure 5: The training results in 0.2 $m/s$ reference speed. (a) Cumulative reward with training episodes. Variations in (b) entropy and (c) temperature during the training process.
...and 5 more figures

Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

TL;DR

Abstract

Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)