Dynamic Bipedal Maneuvers through Sim-to-Real Reinforcement Learning
Fangzhou Yu, Ryan Batke, Jeremy Dao, Jonathan Hurst, Kevin Green, Alan Fern
TL;DR
The paper tackles enabling highly dynamic, aperiodic bipedal maneuvers and reliable sim-to-real transfer by training recurrent turning policies from offline SRBM trajectory data. It introduces an epilogue reward mechanism to ensure smooth transitions back to nominal walking after a turn, and leverages dynamics randomization to bridge the sim-to-real gap. By comparing four reward formulations, it analyzes how reference information shapes learning and turning performance, demonstrating successful sim-to-real transfer on Cassie for four-step 90-degree turns. The study highlights both the promise and current hardware challenges, emphasizing the need for scalable switching among many behavior policies for more complex dynamic routines.
Abstract
For legged robots to match the athletic capabilities of humans and animals, they must not only produce robust periodic walking and running, but also seamlessly switch between nominal locomotion gaits and more specialized transient maneuvers. Despite recent advancements in controls of bipedal robots, there has been little focus on producing highly dynamic behaviors. Recent work utilizing reinforcement learning to produce policies for control of legged robots have demonstrated success in producing robust walking behaviors. However, these learned policies have difficulty expressing a multitude of different behaviors on a single network. Inspired by conventional optimization-based control techniques for legged robots, this work applies a recurrent policy to execute four-step, 90 degree turns trained using reference data generated from optimized single rigid body model trajectories. We present a novel training framework using epilogue terminal rewards for learning specific behaviors from pre-computed trajectory data and demonstrate a successful transfer to hardware on the bipedal robot Cassie.
