Table of Contents
Fetching ...

Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion

Boyuan Liang, Lingfeng Sun, Xinghao Zhu, Bike Zhang, Ziyin Xiong, Yixiao Wang, Chenran Li, Koushil Sreenath, Masayoshi Tomizuka

TL;DR

This work tackles the challenge of learning energy-efficient quadruped locomotion without hand-crafted gait priors. It introduces a velocity-dependent energy reward, shaping the overall objective as $R=(R_{motion}+\alpha_{en}R_{en}(v_x,\omega_z))\exp(-R_{aux})$ with $R_{en}=\exp\left(-\frac{\sum_i |\tau_i||\dot{q}_i|}{\sigma_{en,x}|v_x|+\sigma_{en,z}|\omega_z|}\right)$, enabling a single policy to emerge gait transitions across speeds (four-beat walking at low speed, trotting at moderate speed, and fly-trotting at high speed). Through PPO training in IsaacGym and transfer to the Unitree Go1 and real-world Go1 hardware, the method yields improved cost of transport and stable velocity tracking without gait priors, while also performing circle-tracking and terrain-clearance tasks robustly. The results demonstrate the practicality of a simple, energy-centric reward for robust, energy-efficient locomotion and suggest broad applicability to other robotic tasks beyond locomotion. The study highlights the potential to reduce reward engineering complexity while achieving adaptive, energy-aware behavior in legged robots.

Abstract

In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Pre-defined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we propose a simplified, energy-centric reward strategy to foster the development of energy-efficient locomotion across various speeds in quadruped robots. By implementing an adaptive energy reward function and adjusting the weights based on velocity, we demonstrate that our approach enables ANYmal-C and Unitree Go1 robots to autonomously select appropriate gaits, such as four-beat walking at lower speeds and trotting at higher speeds, resulting in improved energy efficiency and stable velocity tracking compared to previous methods using complex reward designs and prior gait knowledge. The effectiveness of our policy is validated through simulations in the IsaacGym simulation environment and on real robots, demonstrating its potential to facilitate stable and adaptive locomotion.

Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion

TL;DR

This work tackles the challenge of learning energy-efficient quadruped locomotion without hand-crafted gait priors. It introduces a velocity-dependent energy reward, shaping the overall objective as with , enabling a single policy to emerge gait transitions across speeds (four-beat walking at low speed, trotting at moderate speed, and fly-trotting at high speed). Through PPO training in IsaacGym and transfer to the Unitree Go1 and real-world Go1 hardware, the method yields improved cost of transport and stable velocity tracking without gait priors, while also performing circle-tracking and terrain-clearance tasks robustly. The results demonstrate the practicality of a simple, energy-centric reward for robust, energy-efficient locomotion and suggest broad applicability to other robotic tasks beyond locomotion. The study highlights the potential to reduce reward engineering complexity while achieving adaptive, energy-aware behavior in legged robots.

Abstract

In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Pre-defined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we propose a simplified, energy-centric reward strategy to foster the development of energy-efficient locomotion across various speeds in quadruped robots. By implementing an adaptive energy reward function and adjusting the weights based on velocity, we demonstrate that our approach enables ANYmal-C and Unitree Go1 robots to autonomously select appropriate gaits, such as four-beat walking at lower speeds and trotting at higher speeds, resulting in improved energy efficiency and stable velocity tracking compared to previous methods using complex reward designs and prior gait knowledge. The effectiveness of our policy is validated through simulations in the IsaacGym simulation environment and on real robots, demonstrating its potential to facilitate stable and adaptive locomotion.
Paper Structure (17 sections, 4 equations, 7 figures)

This paper contains 17 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: Compared to the baseline when there is no energy regularization, our single policy (from one-time RL training) autonomously adopted different energy-efficient gaits (walking, trotting and fly trotting). It achieved lower energy consumption (adjusted leg-swing) at varying speeds.
  • Figure 2: Gait switching under different command velocities. The policy is generated when $\alpha_{en}=1.0$. As the command velocity increases, the policy shows automatic gait transition. We also demonstrate snapshots of two beat walking at 0.5 m/s, trotting at 1.4 m/s and fly trotting at 2.3 m/s.
  • Figure 3: Ablation study of energy consumption in Unitree Go1 simulation. For straight line walking, reference linear velocities are chosen from $0.1$ to $2.5$ m/s with $0.1$ common gap. The cost of transportation is measured in $J/m$. For angular spining, reference angular velocities are chosen from $-2.5$ to $2.5$ rad/s with $0.2$ common gap. In both (a) and (b), CoT considerably decreases when $\alpha_{en}$ reaches $1.0$. CoT of $\alpha_{en}=1.5$ when reference velocity of above $1.9$ m/s is not plotted because the output velocity drops to zero in this range. This indicates that velocity tracking accuracy will be sacrificed when energy regularization weight $\alpha_{en}$ is too large. For terrain walking, the robot is asked to walk in a straight line on a rough slope terrain with reference linear velocities from $0.3$ to $1.5$ m/s with $0.1$ common gap, because it is hard to walk either too slowly or too quickly on such terrains. (c) shows that the reduced CoT analogously appears on terrains. (d) shows the effect of energy regularization method to ANYmal-C platform with the same parameters $\sigma_{en,x}$, $\sigma_{en,z}$ and $\alpha_{en}$.
  • Figure 4: Gait under different command velocities when $\alpha_{en}=0.0$. The policy shows bouncing gait across all command velocities, which is not an energy-efficient choice of gait. We also demonstrate snapshots of bouncing gait, which is the four legs touches and leaves the ground almost simultaneously.
  • Figure 5: Gait comparison of ANYmal-C between energy regularization and original legged gym policy rudin2021learning. Similar gait transition from walking to trotting also appears when energy regularization is applied. In the original legged gym policy, the lifting height of rear right leg is very low, so it has several unexpected mild contacts with the ground. Videos of ANYmal-C simulation can be found on our https://sites.google.com/berkeley.edu/efficient-locomotion.
  • ...and 2 more figures