Table of Contents
Fetching ...

Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

Tianhu Peng, Lingfan Bao, Joseph Humphreys, Andromachi Maria Delfaki, Dimitrios Kanoulas, Chengxu Zhou

TL;DR

This work tackles enabling a quadruped robot to walk in a bipedal gait using only its rear legs by leveraging a teacher–student framework augmented with Adversarial Motion Priors (AMP) and trajectory-optimization–generated references. The teacher policy, trained with Proximal Policy Optimization, uses privileged terrain and state information to learn a robust action strategy, while the student imitates the teacher through supervised learning with a memory-augmented latent reconstruction objective. The approach integrates AMP to encourage gait-stye imitation via a discriminator and a composite reward $r_t = r_t^g + r_t^s + r_t^l$, with $a_t \in \mathbb{R}^{12}$ controlling joint positions; references are produced by Trajectory Optimization using TOWR and refined with inverse kinematics. In Isaac Gym simulations, the policy demonstrates robust bipedal locomotion across flat and complex terrains, with domain randomization aiding sim-to-real transfer and revealing strengths and limitations in speed- and terrain-dependent tracking and disturbance rejection.

Abstract

Previous studies have successfully demonstrated agile and robust locomotion in challenging terrains for quadrupedal robots. However, the bipedal locomotion mode for quadruped robots remains unverified. This paper explores the adaptation of a learning framework originally designed for quadrupedal robots to operate blind locomotion in biped mode. We leverage a framework that incorporates Adversarial Motion Priors with a teacher-student policy to enable imitation of a reference trajectory and navigation on tough terrain. Our work involves transferring and evaluating a similar learning framework on a quadruped robot in biped mode, aiming to achieve stable walking on both flat and complicated terrains. Our simulation results demonstrate that the trained policy enables the quadruped robot to navigate both flat and challenging terrains, including stairs and uneven surfaces.

Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

TL;DR

This work tackles enabling a quadruped robot to walk in a bipedal gait using only its rear legs by leveraging a teacher–student framework augmented with Adversarial Motion Priors (AMP) and trajectory-optimization–generated references. The teacher policy, trained with Proximal Policy Optimization, uses privileged terrain and state information to learn a robust action strategy, while the student imitates the teacher through supervised learning with a memory-augmented latent reconstruction objective. The approach integrates AMP to encourage gait-stye imitation via a discriminator and a composite reward , with controlling joint positions; references are produced by Trajectory Optimization using TOWR and refined with inverse kinematics. In Isaac Gym simulations, the policy demonstrates robust bipedal locomotion across flat and complex terrains, with domain randomization aiding sim-to-real transfer and revealing strengths and limitations in speed- and terrain-dependent tracking and disturbance rejection.

Abstract

Previous studies have successfully demonstrated agile and robust locomotion in challenging terrains for quadrupedal robots. However, the bipedal locomotion mode for quadruped robots remains unverified. This paper explores the adaptation of a learning framework originally designed for quadrupedal robots to operate blind locomotion in biped mode. We leverage a framework that incorporates Adversarial Motion Priors with a teacher-student policy to enable imitation of a reference trajectory and navigation on tough terrain. Our work involves transferring and evaluating a similar learning framework on a quadruped robot in biped mode, aiming to achieve stable walking on both flat and complicated terrains. Our simulation results demonstrate that the trained policy enables the quadruped robot to navigate both flat and challenging terrains, including stairs and uneven surfaces.
Paper Structure (15 sections, 5 equations, 5 figures, 2 tables)

This paper contains 15 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the teacher-student learning framework. (a) The teacher policy, which leverages privileged data $S^p_t$,terrain information $o^e_t$ and prospective data $o^p_t$ through RL, aims to maximize a total reward $r_t$ comprising command task reward $r_t^g$, style reward $r_t^g$ based on AMP, and regulation reward $r_t^g$ for ensuring safety and smooth motion. (b) The student policy, trained via supervised learning, seeks to imitate the teacher's actions $a^{teacher}_t$ and reconstruct the teacher's latent states $l^{teacher}_t$ from historical and prospective observations $H_t: [O_{t-N},O_{t-N-1},...,O_{t-1}]$
  • Figure 2: Terrains in Isaac Gym Simulations.
  • Figure 3: Base linear velocity in the x direction and base angular velocity in yaw for the robot on both uniform and discrete obstacle terrains.
  • Figure 4: Robot's feet contact forces on both uniform and discrete obstacle terrains.
  • Figure 5: Snapshots illustrating the response of a robot subjected to a 100N push force (indicated by a red arrow) applied along the x-axis over a duration of 0.1 seconds. The sequence shows the robot's movement and stabilization process.