Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation
Yusuke Tanaka, Alvin Zhu, Quanyou Wang, Yeting Liu, Dennis Hong
TL;DR
The paper tackles the problem of ineffective RL for humanoid locomotion when parallel actuation mechanics are ignored. It introduces GPU-accelerated MJX simulations that natively enforce closed-chain constraints for a differential pulley, a five-bar linkage, and a four-bar linkage, enabling end-to-end curriculum RL on the BRUCE humanoid with zero-shot sim-to-real transfer. Empirical results show the learned policies outperform a model predictive controller in real hardware across multiple surfaces, with robust standstill stability, adaptive walking, and reasonable speeds. The work demonstrates the practical significance of incorporating mechanical intelligence into learning-based control and provides a scalable approach for integrating complex parallel mechanisms into RL pipelines for legged robots.
Abstract
Reinforcement learning (RL) has enabled advances in humanoid robot locomotion, yet most learning frameworks do not account for mechanical intelligence embedded in parallel actuation mechanisms due to limitations in simulator support for closed kinematic chains. This omission can lead to inaccurate motion modeling and suboptimal policies, particularly for robots with high actuation complexity. This paper presents general formulations and simulation methods for three types of parallel mechanisms: a differential pulley, a five-bar linkage, and a four-bar linkage, and trains a parallel-mechanism aware policy through an end-to-end curriculum RL framework for BRUCE, a kid-sized humanoid robot. Unlike prior approaches that rely on simplified serial approximations, we simulate all closed-chain constraints natively using GPU-accelerated MuJoCo (MJX), preserving the hardware's mechanical nonlinear properties during training. We benchmark our RL approach against a model predictive controller (MPC), demonstrating better surface generalization and performance in real-world zero-shot deployment. This work highlights the computational approaches and performance benefits of fully simulating parallel mechanisms in end-to-end learning pipelines for legged humanoids. Project codes with parallel mechanisms: https://github.com/alvister88/og_bruce
