A Learning Framework for Diverse Legged Robot Locomotion Using Barrier-Based Style Rewards
Gijeong Kim, Yong-Hoon Lee, Hae-Won Park
TL;DR
This paper tackles the challenge of teaching a single quadruped robot to operate in diverse locomotion modes (quadruped, tripod, and biped) while performing multiple tasks. It introduces a barrier-based learning framework that uses a relaxed logarithmic barrier reward to softly constrain motion style (gait, foot clearance, body height, joint posture), enabling flexible gait adjustments during training. The method combines gait encoding, mode-dependent barrier constraints, and a concurrent, multi-critic PPO training regime with sim-to-real transfer, and demonstrates fast learning and strong performance on the KAIST HOUND family, including high obstacle clearance and bipedal speed with load. This approach offers a scalable path toward versatile, natural locomotion in legged robots without heavy reliance on exteroceptive sensing or extensive reward engineering, with clear real-world impact for search-and-rescue, exploration, and human-robot interaction scenarios.
Abstract
This work introduces a model-free reinforcement learning framework that enables various modes of motion (quadruped, tripod, or biped) and diverse tasks for legged robot locomotion. We employ a motion-style reward based on a relaxed logarithmic barrier function as a soft constraint, to bias the learning process toward the desired motion style, such as gait, foot clearance, joint position, or body height. The predefined gait cycle is encoded in a flexible manner, facilitating gait adjustments throughout the learning process. Extensive experiments demonstrate that KAIST HOUND, a 45 kg robotic system, can achieve biped, tripod, and quadruped locomotion using the proposed framework; quadrupedal capabilities include traversing uneven terrain, galloping at 4.67 m/s, and overcoming obstacles up to 58 cm (67 cm for HOUND2); bipedal capabilities include running at 3.6 m/s, carrying a 7.5 kg object, and ascending stairs-all performed without exteroceptive input.
