Table of Contents
Fetching ...

PALo: Learning Posture-Aware Locomotion for Quadruped Robots

Xiangyu Miao, Jun Sun, Hang Lai, Xinpeng Di, Jiahang Cao, Yong Yu, Weinan Zhang

TL;DR

PALo tackles posture-aware locomotion for quadruped robots by learning an end-to-end policy that jointly tracks 6D velocity/posture commands. It employs a partially observable MDP with an asymmetric actor-critic, augmented by Adversarial Motion Priors and a layered training curriculum, plus domain randomization to bridge the sim-to-real gap. Key contributions include integrating posture control into 6D command tracking, using AMP to simplify rewards, and demonstrating successful sim-to-real transfer on a real robot without fine-tuning, along with comprehensive ablations. The results show robust performance across diverse terrains and highlight the importance of AMP, curricula, and encoder design for stable, real-time posture-aware locomotion, establishing PALo as a foundation for higher-level embodied intelligence modules.

Abstract

With the rapid development of embodied intelligence, locomotion control of quadruped robots on complex terrains has become a research hotspot. Unlike traditional locomotion control approaches focusing solely on velocity tracking, we pursue to balance the agility and robustness of quadruped robots on diverse and complex terrains. To this end, we propose an end-to-end deep reinforcement learning framework for posture-aware locomotion named PALo, which manages to handle simultaneous linear and angular velocity tracking and real-time adjustments of body height, pitch, and roll angles. In PALo, the locomotion control problem is formulated as a partially observable Markov decision process, and an asymmetric actor-critic architecture is adopted to overcome the sim-to-real challenge. Further, by incorporating customized training curricula, PALo achieves agile posture-aware locomotion control in simulated environments and successfully transfers to real-world settings without fine-tuning, allowing real-time control of the quadruped robot's locomotion and body posture across challenging terrains. Through in-depth experimental analysis, we identify the key components of PALo that contribute to its performance, further validating the effectiveness of the proposed method. The results of this study provide new possibilities for the low-level locomotion control of quadruped robots in higher dimensional command spaces and lay the foundation for future research on upper-level modules for embodied intelligence.

PALo: Learning Posture-Aware Locomotion for Quadruped Robots

TL;DR

PALo tackles posture-aware locomotion for quadruped robots by learning an end-to-end policy that jointly tracks 6D velocity/posture commands. It employs a partially observable MDP with an asymmetric actor-critic, augmented by Adversarial Motion Priors and a layered training curriculum, plus domain randomization to bridge the sim-to-real gap. Key contributions include integrating posture control into 6D command tracking, using AMP to simplify rewards, and demonstrating successful sim-to-real transfer on a real robot without fine-tuning, along with comprehensive ablations. The results show robust performance across diverse terrains and highlight the importance of AMP, curricula, and encoder design for stable, real-time posture-aware locomotion, establishing PALo as a foundation for higher-level embodied intelligence modules.

Abstract

With the rapid development of embodied intelligence, locomotion control of quadruped robots on complex terrains has become a research hotspot. Unlike traditional locomotion control approaches focusing solely on velocity tracking, we pursue to balance the agility and robustness of quadruped robots on diverse and complex terrains. To this end, we propose an end-to-end deep reinforcement learning framework for posture-aware locomotion named PALo, which manages to handle simultaneous linear and angular velocity tracking and real-time adjustments of body height, pitch, and roll angles. In PALo, the locomotion control problem is formulated as a partially observable Markov decision process, and an asymmetric actor-critic architecture is adopted to overcome the sim-to-real challenge. Further, by incorporating customized training curricula, PALo achieves agile posture-aware locomotion control in simulated environments and successfully transfers to real-world settings without fine-tuning, allowing real-time control of the quadruped robot's locomotion and body posture across challenging terrains. Through in-depth experimental analysis, we identify the key components of PALo that contribute to its performance, further validating the effectiveness of the proposed method. The results of this study provide new possibilities for the low-level locomotion control of quadruped robots in higher dimensional command spaces and lay the foundation for future research on upper-level modules for embodied intelligence.

Paper Structure

This paper contains 27 sections, 11 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Unitree A1 quadruped robot demonstrating posture-aware locomotion control (from left to right: height, pitch, and roll control)
  • Figure 2: Schematic diagram of the robot's body frame rendered in simulation, showing the coordinate axes (red: x-axis, green: y-axis, blue: z-axis) and local height map sampling points (yellow). The 6D commands correspond to the robot's motion tracking in linear velocity, angular velocity, height, pitch, and roll.
  • Figure 3: Overview of our framework PALo. The orange dashed line represents gradient backpropagation during training, while the gray dashed line indicates observations obtained from the simulator or real-world environment and fed into our framework. Solid arrows denote the data flow within our model. The actor is a shallow MLP that outputs 12D target DOF positions, which are converted to torques via a PD controller. The actor’s input combines a 6D command with a history latent vector from a history encoder. The critic is a shallow MLP, and the AMP discriminator is an MLP.
  • Figure 4: Dynamic 6D command tracking performance on simulated flat terrain.
  • Figure 5: Real-world deployment of the trained policy on various structured and unstructured terrains. The 3×3 composite image illustrates the robot navigating different terrain types, demonstrating its adaptability.
  • ...and 4 more figures