RL + Model-based Control: Using On-demand Optimal Control to Learn Versatile Legged Locomotion
Dongho Kang, Jin Cheng, Miguel Zamora, Fatemeh Zargarbashi, Stelian Coros
TL;DR
This paper tackles the challenge of achieving versatile, robust legged locomotion across gaits, velocities, and terrains by fusing model-based optimal control (MBOC) with reinforcement learning (RL). It introduces on-demand reference motions generated by a finite-horizon OCP using a Variable Height Inverted Pendulum Model (VHIPM) to guide a deep RL policy that imitates both base and foot trajectories. The key contributions include on-demand reference motion generation for training, a single RL policy capable of diverse gait patterns without robot-specific reward tuning, and hardware validation on Go1 and Aliengo demonstrating strong sim-to-real transfer. The approach offers a scalable, data-efficient pathway to robust legged control applicable to multiple quadruped platforms, reducing hand-engineering while preserving dynamic capabilities.
Abstract
This paper presents a control framework that combines model-based optimal control and reinforcement learning (RL) to achieve versatile and robust legged locomotion. Our approach enhances the RL training process by incorporating on-demand reference motions generated through finite-horizon optimal control, covering a broad range of velocities and gaits. These reference motions serve as targets for the RL policy to imitate, leading to the development of robust control policies that can be learned with reliability. Furthermore, by utilizing realistic simulation data that captures whole-body dynamics, RL effectively overcomes the inherent limitations in reference motions imposed by modeling simplifications. We validate the robustness and controllability of the RL training process within our framework through a series of experiments. In these experiments, our method showcases its capability to generalize reference motions and effectively handle more complex locomotion tasks that may pose challenges for the simplified model, thanks to RL's flexibility. Additionally, our framework effortlessly supports the training of control policies for robots with diverse dimensions, eliminating the necessity for robot-specific adjustments in the reward function and hyperparameters.
