Table of Contents
Fetching ...

PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots

Zhiyuan Xiao, Xinyu Zhang, Xiang Zhou, Qingrui Zhang

TL;DR

This work tackles robust blind quadruped locomotion under external perturbations with no force sensors by proposing PA-LOCO, a privileged-learning framework that adds multiple encoders ($E^F$, $E^T$, $E^S$) and a residual policy to enable perturbation-adaptive behavior through latent features $l^F_t$, $l^T_t$, $l^S_t$. Training proceeds in three phases: teacher trained with privileged information, student imitation using proprioceptive observations, and residual network optimization to boost performance under disturbances. Empirical results on a Unitree GO1 demonstrate improved robustness, stability, and faster recovery across diverse terrains, with ablations confirming the benefits of the multi-encoder latent decoupling and residual module. The findings suggest that decoupling privileged information into dedicated latent spaces and augmenting the student with a residual path can significantly enhance real-world quadruped locomotion under perturbations.

Abstract

Numerous locomotion controllers have been designed based on Reinforcement Learning (RL) to facilitate blind quadrupedal locomotion traversing challenging terrains. Nevertheless, locomotion control is still a challenging task for quadruped robots traversing diverse terrains amidst unforeseen disturbances. Recently, privileged learning has been employed to learn reliable and robust quadrupedal locomotion over various terrains based on a teacher-student architecture. However, its one-encoder structure is not adequate in addressing external force perturbations. The student policy would experience inevitable performance degradation due to the feature embedding discrepancy between the feature encoder of the teacher policy and the one of the student policy. Hence, this paper presents a privileged learning framework with multiple feature encoders and a residual policy network for robust and reliable quadruped locomotion subject to various external perturbations. The multi-encoder structure can decouple latent features from different privileged information, ultimately leading to enhanced performance of the learned policy in terms of robustness, stability, and reliability. The efficiency of the proposed feature encoding module is analyzed in depth using extensive simulation data. The introduction of the residual policy network helps mitigate the performance degradation experienced by the student policy that attempts to clone the behaviors of a teacher policy. The proposed framework is evaluated on a Unitree GO1 robot, showcasing its performance enhancement over the state-of-the-art privileged learning algorithm through extensive experiments conducted on diverse terrains. Ablation studies are conducted to illustrate the efficiency of the residual policy network.

PA-LOCO: Learning Perturbation-Adaptive Locomotion for Quadruped Robots

TL;DR

This work tackles robust blind quadruped locomotion under external perturbations with no force sensors by proposing PA-LOCO, a privileged-learning framework that adds multiple encoders (, , ) and a residual policy to enable perturbation-adaptive behavior through latent features , , . Training proceeds in three phases: teacher trained with privileged information, student imitation using proprioceptive observations, and residual network optimization to boost performance under disturbances. Empirical results on a Unitree GO1 demonstrate improved robustness, stability, and faster recovery across diverse terrains, with ablations confirming the benefits of the multi-encoder latent decoupling and residual module. The findings suggest that decoupling privileged information into dedicated latent spaces and augmenting the student with a residual path can significantly enhance real-world quadruped locomotion under perturbations.

Abstract

Numerous locomotion controllers have been designed based on Reinforcement Learning (RL) to facilitate blind quadrupedal locomotion traversing challenging terrains. Nevertheless, locomotion control is still a challenging task for quadruped robots traversing diverse terrains amidst unforeseen disturbances. Recently, privileged learning has been employed to learn reliable and robust quadrupedal locomotion over various terrains based on a teacher-student architecture. However, its one-encoder structure is not adequate in addressing external force perturbations. The student policy would experience inevitable performance degradation due to the feature embedding discrepancy between the feature encoder of the teacher policy and the one of the student policy. Hence, this paper presents a privileged learning framework with multiple feature encoders and a residual policy network for robust and reliable quadruped locomotion subject to various external perturbations. The multi-encoder structure can decouple latent features from different privileged information, ultimately leading to enhanced performance of the learned policy in terms of robustness, stability, and reliability. The efficiency of the proposed feature encoding module is analyzed in depth using extensive simulation data. The introduction of the residual policy network helps mitigate the performance degradation experienced by the student policy that attempts to clone the behaviors of a teacher policy. The proposed framework is evaluated on a Unitree GO1 robot, showcasing its performance enhancement over the state-of-the-art privileged learning algorithm through extensive experiments conducted on diverse terrains. Ablation studies are conducted to illustrate the efficiency of the residual policy network.
Paper Structure (13 sections, 1 equation, 7 figures, 5 tables)

This paper contains 13 sections, 1 equation, 7 figures, 5 tables.

Figures (7)

  • Figure 1: A Unitree Go1 quadruped robot is subject to a kick, while standing on a grass field.
  • Figure 2: The proposed PA-LOCO integrates a teacher-student framework with a residual network and multiple feature encoders. The training process involves three phases. In the first phase, the teacher policy is trained with proprioceptive observations $o_t$ and privileged information $F_t, T_t, S_t$ that is unknown for deployment. In the second phase, the student policy is trained using observations from the proprioceptive sensors. The student policy is learned to clone the teacher's actions and latent features by supervised learning. In the third phase, the residual policy network is trained to further enhance the performance of the student policy against perturbations.
  • Figure 3: The locomotion behavior when subjected to external force impulses from the front. The trunk velocity and height responses are provided in the second image from the top. The feet's contact patterns with the ground (F/R denotes Front/Rear and R/L denotes Right/Left) are given in the third image from the top. The image at the bottom shows the plots of the force latent variables given by the force encoder.
  • Figure 4: Indoor experiment setup.
  • Figure 5: The t-SNE visualization of learned latent representation. In different trials, the robot is subjected to a constant backward force of different magnitudes when it moves forward. It indicates that the student policy is perturbation-aware due to the force encoder.
  • ...and 2 more figures