End-to-End Reinforcement Learning for Torque Based Variable Height Hopping
Raghav Soni, Daniel Harnack, Hannah Isermann, Sotaro Fushimi, Shivesh Kumar, Frank Kirchner
TL;DR
This work tackles dynamic, height-adjustable hopping with a monoped by training an end-to-end torque controller that maps proprioceptive feedback directly to joint torques, thereby avoiding explicit jump-phase detection and PD-based control strategies. It combines an energy-shaping baseline as a reference with a Soft Actor-Critic (SAC) policy trained in MuJoCo simulation, using a carefully designed reward that includes an energy term and penalties for height excess, jerky actions, and joint constraints. A two-stage CMA-ES based sim-to-real parameter optimization is employed to bridge the gap between simulation and the real robot, supplemented by robustness strategies for delays and sensor/actuator noise. The resulting controller demonstrates interpolation across jump heights and successful transfer to hardware without parameter tuning, marking a first end-to-end monoped hopping controller, with practical implications for adaptable, torque-based locomotion on unstructured terrains.
Abstract
Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. Intensive research into dynamic walking and running controllers has recently yielded great advances, both in the optimal control and reinforcement learning (RL) literature. Hopping is a challenging dynamic task involving a flight phase and has the potential to increase the traversability of legged robots. Model based control for hopping typically relies on accurate detection of different jump phases, such as lift-off or touch down, and using different controllers for each phase. In this paper, we present a end-to-end RL based torque controller that learns to implicitly detect the relevant jump phases, removing the need to provide manual heuristics for state detection. We also extend a method for simulation to reality transfer of the learned controller to contact rich dynamic tasks, resulting in successful deployment on the robot after training without parameter tuning.
