Table of Contents
Fetching ...

Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion

Xingye Da, Zhaoming Xie, David Hoeller, Byron Boots, Animashree Anandkumar, Yuke Zhu, Buck Babich, Animesh Garg

TL;DR

This work addresses the challenge of enabling robust and energy-efficient legged locomotion under varying and novel environments. It introduces a hierarchical framework that couples a high-level reinforcement learning policy, which selects from a set of 9 primitives, with a low-level model-based controller that executes these primitives via quadratic programming and swing-foot control. Key contributions include sample-efficient RL training, zero-shot adaptation to novel scenarios, and direct sim-to-real transfer on a Laikago quadruped without randomization. The results show substantial energy savings, adaptive contact sequencing, and robust performance in split-belt and perturbation scenarios, highlighting the practical impact of integrating model-based control with learning for real-time locomotion.

Abstract

We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago). The system consists of a high-level controller that learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives. Our framework learns a controller that can adapt to challenging environmental changes on the fly, including novel scenarios not seen during training. The learned controller is up to 85~percent more energy efficient and is more robust compared to baseline methods. We also deploy the controller on a physical robot without any randomization or adaptation scheme.

Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion

TL;DR

This work addresses the challenge of enabling robust and energy-efficient legged locomotion under varying and novel environments. It introduces a hierarchical framework that couples a high-level reinforcement learning policy, which selects from a set of 9 primitives, with a low-level model-based controller that executes these primitives via quadratic programming and swing-foot control. Key contributions include sample-efficient RL training, zero-shot adaptation to novel scenarios, and direct sim-to-real transfer on a Laikago quadruped without randomization. The results show substantial energy savings, adaptive contact sequencing, and robust performance in split-belt and perturbation scenarios, highlighting the practical impact of integrating model-based control with learning for real-time locomotion.

Abstract

We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago). The system consists of a high-level controller that learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives. Our framework learns a controller that can adapt to challenging environmental changes on the fly, including novel scenarios not seen during training. The learned controller is up to 85~percent more energy efficient and is more robust compared to baseline methods. We also deploy the controller on a physical robot without any randomization or adaptation scheme.

Paper Structure

This paper contains 37 sections, 14 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: (a) Riding a skateboard requires a contact sequence that only moves the feet on the ground while keeping the feet on the board still. (b) "banana peel" test: we put a frictionless mat under a foot to test robustness. (c) We train and test the robot on a split-belt treadmill where the speeds of the two belts are changed separately with the robot facing different directions.
  • Figure 2: Overview of our system. Left: Primitives $P_i$ are distinguished by the contact configuration. The stance legs in each primitive are colored orange. Center: Hierarchical structure of the controller. The high-level controller chooses from a set of primitives based on the robot state $s_t$, and the low-level controller computes the motor torques $\tau$ based on the robot state and the primitive chosen. Right: The low-level controller uses stance foot forces to control the base pose and moves the swing feet to their target positions.
  • Figure 3: Training and Testing scenarios. Scenarios (a)-(c) scenarios are used during training where we vary the treadmill speeds, the number of moving belts, and the orientation of the robot. Scenarios (d)-(e) are introduced only during testing. Scenario (d) introduces a fixed plywood bridge on top of the treadmill, and scenario (e) inserts a frictionless mat under the feet of the robot to test stability.
  • Figure 4: Comparison of the average energy used. (a) The standing, walking and heuristic controllers fails at high speed, while trotting and pacing controllers remain on high-energy level. The learned controller (rl) can handle all speed variation and more energy efficient. (b) The only baseline controllers that can handle split-treadmill are trotting and pacing. The learned controller is 50 percent more energy efficient on average. Energy for the learned controller drops significantly at $yaw = 150~\text{deg}$ because only one foot moves while two feet move in nearby orientations.
  • Figure 5: Contact sequence of different high-level controllers under different scenarios. A filled green block indicates that the corresponding foot is in contact with the ground. The three baseline controllers (standing, walking, and trotting) each use a fixed contact sequence for all scenarios, while the learned controller adapts the contact sequence to the scenario.
  • ...and 1 more figures