Table of Contents
Fetching ...

A Learning Framework for Diverse Legged Robot Locomotion Using Barrier-Based Style Rewards

Gijeong Kim, Yong-Hoon Lee, Hae-Won Park

TL;DR

This paper tackles the challenge of teaching a single quadruped robot to operate in diverse locomotion modes (quadruped, tripod, and biped) while performing multiple tasks. It introduces a barrier-based learning framework that uses a relaxed logarithmic barrier reward to softly constrain motion style (gait, foot clearance, body height, joint posture), enabling flexible gait adjustments during training. The method combines gait encoding, mode-dependent barrier constraints, and a concurrent, multi-critic PPO training regime with sim-to-real transfer, and demonstrates fast learning and strong performance on the KAIST HOUND family, including high obstacle clearance and bipedal speed with load. This approach offers a scalable path toward versatile, natural locomotion in legged robots without heavy reliance on exteroceptive sensing or extensive reward engineering, with clear real-world impact for search-and-rescue, exploration, and human-robot interaction scenarios.

Abstract

This work introduces a model-free reinforcement learning framework that enables various modes of motion (quadruped, tripod, or biped) and diverse tasks for legged robot locomotion. We employ a motion-style reward based on a relaxed logarithmic barrier function as a soft constraint, to bias the learning process toward the desired motion style, such as gait, foot clearance, joint position, or body height. The predefined gait cycle is encoded in a flexible manner, facilitating gait adjustments throughout the learning process. Extensive experiments demonstrate that KAIST HOUND, a 45 kg robotic system, can achieve biped, tripod, and quadruped locomotion using the proposed framework; quadrupedal capabilities include traversing uneven terrain, galloping at 4.67 m/s, and overcoming obstacles up to 58 cm (67 cm for HOUND2); bipedal capabilities include running at 3.6 m/s, carrying a 7.5 kg object, and ascending stairs-all performed without exteroceptive input.

A Learning Framework for Diverse Legged Robot Locomotion Using Barrier-Based Style Rewards

TL;DR

This paper tackles the challenge of teaching a single quadruped robot to operate in diverse locomotion modes (quadruped, tripod, and biped) while performing multiple tasks. It introduces a barrier-based learning framework that uses a relaxed logarithmic barrier reward to softly constrain motion style (gait, foot clearance, body height, joint posture), enabling flexible gait adjustments during training. The method combines gait encoding, mode-dependent barrier constraints, and a concurrent, multi-critic PPO training regime with sim-to-real transfer, and demonstrates fast learning and strong performance on the KAIST HOUND family, including high obstacle clearance and bipedal speed with load. This approach offers a scalable path toward versatile, natural locomotion in legged robots without heavy reliance on exteroceptive sensing or extensive reward engineering, with clear real-world impact for search-and-rescue, exploration, and human-robot interaction scenarios.

Abstract

This work introduces a model-free reinforcement learning framework that enables various modes of motion (quadruped, tripod, or biped) and diverse tasks for legged robot locomotion. We employ a motion-style reward based on a relaxed logarithmic barrier function as a soft constraint, to bias the learning process toward the desired motion style, such as gait, foot clearance, joint position, or body height. The predefined gait cycle is encoded in a flexible manner, facilitating gait adjustments throughout the learning process. Extensive experiments demonstrate that KAIST HOUND, a 45 kg robotic system, can achieve biped, tripod, and quadruped locomotion using the proposed framework; quadrupedal capabilities include traversing uneven terrain, galloping at 4.67 m/s, and overcoming obstacles up to 58 cm (67 cm for HOUND2); bipedal capabilities include running at 3.6 m/s, carrying a 7.5 kg object, and ascending stairs-all performed without exteroceptive input.
Paper Structure (11 sections, 4 equations, 6 figures, 1 table)

This paper contains 11 sections, 4 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the proposed RL framework. The policy learns to follow the commanded velocity with task-specific preferred gaits and motion styles. Within a single quadruped robot system, three different modes (quadruped, tripod, biped) are implemented, accommodating a variety of tasks: quadruped mode for traversing rough terrain, overcoming high steps, and fast running with a galloping gait; tripod mode for walking; and biped mode for walking, climbing stairs, and carrying a box. Task-specific gaits and styles are enforced via soft constraints in the barrier reward, which imposes a steep gradient in the constraint-violation region, thereby facilitating the learning of the desirable motion style.
  • Figure 2: Graphical representation of gait cycle and desired gait enforcement: (a) illustrates the gait cycle function $g_{i}(t)$ with no phase offset. The constraint boundary $d_{\text{gait}}^{\text{lower}}$ is set to less than zero, enforcing stance or swing only when $|g_{i}(t)| \geq |d_{\text{gait}}^{\text{lower}}|$; (b) indicates when the gait constraint ($f_{i} \geq d_{\text{gait}}^{\text{lower}}$) imposes a severe penalty (left) and a lesser penalty (right). Consequently, the starting points of stance and swing phases are adjusted within the grey region (right).
  • Figure 3: Velocity tracking performance for different $\delta$ values ($1/2$, 1, and 2 times the default setting in Table \ref{['tab:BarrierReward']}) across constraints. Each configuration was evaluated in simulation under randomized conditions identical to the training setup for Traversing Outdoor Terrain in \ref{['subsec:experiment']}, where environments ranged from flat ground to challenging steps and slopes. The graph presents the mean tracking error norms of $v_x$, $v_y$, and $\omega_z$ over a 5-second period across 2000 environments with randomized command velocities.
  • Figure 4: We develop a learning framework for challenging tasks for quadruped robots. The tasks include (A) locomotion over rough terrain, (B) overcoming a high step (58 cm, 67 cm), (C) agile running using a gallop gait, (D) tripod locomotion, and (E) executing humanoid walking motions.
  • Figure 5: We compare the performance of our method against the baseline in (a) Quadruped locomotion on flat terrain (b) Biped locomotion on flat terrain (c) Quadruped locomotion for overcoming high steps. 10 policies trained with different random seeds, were evaluated for both our method and the baseline across 100 random initial conditions for each policy. Each graph shows the mean and standard deviation. In (a) and (b), both our method and the baseline converge with no further improvement after 2500 iterations.
  • ...and 1 more figures