A Gait Driven Reinforcement Learning Framework for Humanoid Robots
Bolin Li, Yuzhi Jiang, Linwei Sun, Xuecong Huang, Lijun Zhu, Han Ding
TL;DR
This work tackles stable, periodic bipedal gait for humanoids in dynamic environments by fusing a dynamic, real-time gait planner with a structured RL reward design. It halves the problem dimensionality by decoupling the 3D robot into two planar subsystems (X-model and Y-model) and approximating each with a Hybrid LIP (H-LIP) to enable fast trajectory generation using Bézier parameterizations that satisfy balance and contact constraints. A three-term reward composition—focused on periodicity, phase correctness, and trajectory tracking—drives PPO-based learning toward efficient, robust gaits, with an end-to-end demonstration that includes a design example, simulations, and real-robot experiments. The approach offers improved training efficiency and reliable, repeatable gait performance, promising practical deployment on humanoid robots in unstructured settings.
Abstract
This paper presents a real-time gait driven training framework for humanoid robots. First, we introduce a novel gait planner that incorporates dynamics to design the desired joint trajectory. In the gait design process, the 3D robot model is decoupled into two 2D models, which are then approximated as hybrid inverted pendulums (H-LIP) for trajectory planning. The gait planner operates in parallel in real time within the robot's learning environment. Second, based on this gait planner, we design three effective reward functions within a reinforcement learning framework, forming a reward composition to achieve periodic bipedal gait. This reward composition reduces the robot's learning time and enhances locomotion performance. Finally, a gait design example, along with simulation and experimental comparisons, is presented to demonstrate the effectiveness of the proposed method.
