Table of Contents
Fetching ...

Coordinated Humanoid Robot Locomotion with Symmetry Equivariant Reinforcement Learning Policy

Buqing Nie, Yang Zhang, Rongjun Jin, Zhanxiang Cao, Huangxuan Lin, Xiaokang Yang, Yue Gao

TL;DR

This work tackles the challenge of symmetry underutilization in humanoid DRL by introducing SE-Policy, which enforces strict symmetry equivariance in the actor and symmetry invariance in the critic using ESCNN-based networks. By combining a history-based encoder with an autoencoder training objective and a symmetry-aware PPO optimization, SE-Policy achieves more temporally and spatially coordinated locomotion on a Unitree G1, with demonstrated sim-to-real transfer aided by curriculum learning and domain randomization. Key findings include superior velocity-tracking accuracy, zero spatial symmetry error, and robust real-world performance across varied terrains, highlighting the practical impact of enforcing morphological symmetry in policy design. The approach offers broad applicability to humanoid robotics, potentially improving user experience and task performance in real-world deployments.

Abstract

The human nervous system exhibits bilateral symmetry, enabling coordinated and balanced movements. However, existing Deep Reinforcement Learning (DRL) methods for humanoid robots neglect morphological symmetry of the robot, leading to uncoordinated and suboptimal behaviors. Inspired by human motor control, we propose Symmetry Equivariant Policy (SE-Policy), a new DRL framework that embeds strict symmetry equivariance in the actor and symmetry invariance in the critic without additional hyperparameters. SE-Policy enforces consistent behaviors across symmetric observations, producing temporally and spatially coordinated motions with higher task performance. Extensive experiments on velocity tracking tasks, conducted in both simulation and real-world deployment with the Unitree G1 humanoid robot, demonstrate that SE-Policy improves tracking accuracy by up to 40% compared to state-of-the-art baselines, while achieving superior spatial-temporal coordination. These results demonstrate the effectiveness of SE-Policy and its broad applicability to humanoid robots.

Coordinated Humanoid Robot Locomotion with Symmetry Equivariant Reinforcement Learning Policy

TL;DR

This work tackles the challenge of symmetry underutilization in humanoid DRL by introducing SE-Policy, which enforces strict symmetry equivariance in the actor and symmetry invariance in the critic using ESCNN-based networks. By combining a history-based encoder with an autoencoder training objective and a symmetry-aware PPO optimization, SE-Policy achieves more temporally and spatially coordinated locomotion on a Unitree G1, with demonstrated sim-to-real transfer aided by curriculum learning and domain randomization. Key findings include superior velocity-tracking accuracy, zero spatial symmetry error, and robust real-world performance across varied terrains, highlighting the practical impact of enforcing morphological symmetry in policy design. The approach offers broad applicability to humanoid robotics, potentially improving user experience and task performance in real-world deployments.

Abstract

The human nervous system exhibits bilateral symmetry, enabling coordinated and balanced movements. However, existing Deep Reinforcement Learning (DRL) methods for humanoid robots neglect morphological symmetry of the robot, leading to uncoordinated and suboptimal behaviors. Inspired by human motor control, we propose Symmetry Equivariant Policy (SE-Policy), a new DRL framework that embeds strict symmetry equivariance in the actor and symmetry invariance in the critic without additional hyperparameters. SE-Policy enforces consistent behaviors across symmetric observations, producing temporally and spatially coordinated motions with higher task performance. Extensive experiments on velocity tracking tasks, conducted in both simulation and real-world deployment with the Unitree G1 humanoid robot, demonstrate that SE-Policy improves tracking accuracy by up to 40% compared to state-of-the-art baselines, while achieving superior spatial-temporal coordination. These results demonstrate the effectiveness of SE-Policy and its broad applicability to humanoid robots.

Paper Structure

This paper contains 34 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The overall architecture of SE-Policy. (a) Left: the architecture of the actor and critic model. (b) upper right: the visualization of $\mathcal{F}_z$, i.e. the symmetric transformation of $z$. The visualization of humanoid robot motions and corresponding symmetric motions. (b) bottom right: the description of equivariant MLP, which is widely utilized in this work.
  • Figure 2: The tracking errors in terms of position (TE-P) and orientation (TE-O) over locomotion time. Lines and shadow areas denote mean values and standard errors. SE-Policy (red) achieves lower TE-P and TE-O over time than other methods, validating the effectiveness of our method.
  • Figure 3: Visualization of each method's locomotion trajectories. The robot is requested to move from the center to eight velocity directions, where dotted lines and red lines are ideal and real trajectories, respectively. Our method shown in Fig. \ref{['fig:traj_vis_SE_Policy']} outperforms baseline methods on tracking accuracy and trajectory symmetry.
  • Figure 4: The visualization of foot movement during locomotion on the plane, where $x$-axis denotes moving distance of the robot torso, and and $y$-axis denotes the height of two feet. Our method achieves consistent motions for two feet. The motions of two feet generated by SE-Policy are consistent with identical amplitudes, step sizes, and temporal coordination.
  • Figure 5: Real world experiments to validate the effectiveness of SE-Policy. The robot tracks given velocity (warning line) without stepping out of the boundary (red line). Please refer to the attached video for more details.
  • ...and 2 more figures