Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

Tianhu Peng; Lingfan Bao; Joseph Humphreys; Andromachi Maria Delfaki; Dimitrios Kanoulas; Chengxu Zhou

Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

Tianhu Peng, Lingfan Bao, Joseph Humphreys, Andromachi Maria Delfaki, Dimitrios Kanoulas, Chengxu Zhou

TL;DR

This work tackles enabling a quadruped robot to walk in a bipedal gait using only its rear legs by leveraging a teacher–student framework augmented with Adversarial Motion Priors (AMP) and trajectory-optimization–generated references. The teacher policy, trained with Proximal Policy Optimization, uses privileged terrain and state information to learn a robust action strategy, while the student imitates the teacher through supervised learning with a memory-augmented latent reconstruction objective. The approach integrates AMP to encourage gait-stye imitation via a discriminator and a composite reward $r_t = r_t^g + r_t^s + r_t^l$, with $a_t \in \mathbb{R}^{12}$ controlling joint positions; references are produced by Trajectory Optimization using TOWR and refined with inverse kinematics. In Isaac Gym simulations, the policy demonstrates robust bipedal locomotion across flat and complex terrains, with domain randomization aiding sim-to-real transfer and revealing strengths and limitations in speed- and terrain-dependent tracking and disturbance rejection.

Abstract

Previous studies have successfully demonstrated agile and robust locomotion in challenging terrains for quadrupedal robots. However, the bipedal locomotion mode for quadruped robots remains unverified. This paper explores the adaptation of a learning framework originally designed for quadrupedal robots to operate blind locomotion in biped mode. We leverage a framework that incorporates Adversarial Motion Priors with a teacher-student policy to enable imitation of a reference trajectory and navigation on tough terrain. Our work involves transferring and evaluating a similar learning framework on a quadruped robot in biped mode, aiming to achieve stable walking on both flat and complicated terrains. Our simulation results demonstrate that the trained policy enables the quadruped robot to navigate both flat and challenging terrains, including stairs and uneven surfaces.

Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

TL;DR

, with

controlling joint positions; references are produced by Trajectory Optimization using TOWR and refined with inverse kinematics. In Isaac Gym simulations, the policy demonstrates robust bipedal locomotion across flat and complex terrains, with domain randomization aiding sim-to-real transfer and revealing strengths and limitations in speed- and terrain-dependent tracking and disturbance rejection.

Abstract

Paper Structure (15 sections, 5 equations, 5 figures, 2 tables)

This paper contains 15 sections, 5 equations, 5 figures, 2 tables.

Introduction
Methodology
Reinforcement Learning on Legged Robots
Adversarial Motion Priors and Rewards Design
Reference Generation
Framework and Training
Learning Framework
Teacher Policy Architecture
Student Policy Architecture
Training and Implementation Details
Termination
Domain Randomization
Simulation Setup
Results
Conclusion and Future work

Figures (5)

Figure 1: Overview of the teacher-student learning framework. (a) The teacher policy, which leverages privileged data $S^p_t$,terrain information $o^e_t$ and prospective data $o^p_t$ through RL, aims to maximize a total reward $r_t$ comprising command task reward $r_t^g$, style reward $r_t^g$ based on AMP, and regulation reward $r_t^g$ for ensuring safety and smooth motion. (b) The student policy, trained via supervised learning, seeks to imitate the teacher's actions $a^{teacher}_t$ and reconstruct the teacher's latent states $l^{teacher}_t$ from historical and prospective observations $H_t: [O_{t-N},O_{t-N-1},...,O_{t-1}]$
Figure 2: Terrains in Isaac Gym Simulations.
Figure 3: Base linear velocity in the x direction and base angular velocity in yaw for the robot on both uniform and discrete obstacle terrains.
Figure 4: Robot's feet contact forces on both uniform and discrete obstacle terrains.
Figure 5: Snapshots illustrating the response of a robot subjected to a 100N push force (indicated by a red arrow) applied along the x-axis over a duration of 0.1 seconds. The sequence shows the robot's movement and stabilization process.

Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

TL;DR

Abstract

Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

Authors

TL;DR

Abstract

Table of Contents

Figures (5)