Table of Contents
Fetching ...

Learning Agile Robotic Locomotion Skills by Imitating Animals

Xue Bin Peng, Erwin Coumans, Tingnan Zhang, Tsang-Wei Lee, Jie Tan, Sergey Levine

TL;DR

This work introduces an imitation-learning framework that enables legged robots to acquire agile locomotion by imitating real animal motions. It combines (i) motion retargeting of animal mocap to robot morphology, (ii) motion imitation in simulation with a pose-goal reward structure, and (iii) sample-efficient domain adaptation using latent dynamics encoding and an information bottleneck to bridge sim-to-real transfer. The approach yields a diverse set of skills on an 18-DoF Laikago quadruped and demonstrates effective real-world transfer with a limited number of trials, highlighting the benefits of latent-domain adaptation over traditional domain randomization alone. The findings suggest that leveraging animal motion data and latent dynamics can significantly reduce reward-design effort while enabling broader, more agile behaviors in legged robots.

Abstract

Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.

Learning Agile Robotic Locomotion Skills by Imitating Animals

TL;DR

This work introduces an imitation-learning framework that enables legged robots to acquire agile locomotion by imitating real animal motions. It combines (i) motion retargeting of animal mocap to robot morphology, (ii) motion imitation in simulation with a pose-goal reward structure, and (iii) sample-efficient domain adaptation using latent dynamics encoding and an information bottleneck to bridge sim-to-real transfer. The approach yields a diverse set of skills on an 18-DoF Laikago quadruped and demonstrates effective real-world transfer with a limited number of trials, highlighting the benefits of latent-domain adaptation over traditional domain randomization alone. The findings suggest that leveraging animal motion data and latent dynamics can significantly reduce reward-design effort while enabling broader, more agile behaviors in legged robots.

Abstract

Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.

Paper Structure

This paper contains 17 sections, 14 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Laikago robot performing locomotion skills learned by imitating motion data recorded from a real dog. Top: Motion capture data recorded from a dog. Middle: Simulated Laikago robot imitating reference motions. Bottom: Real Laikago robot imitating reference motions.
  • Figure 2: The framework consists of three stages: motion retargeting, motion imitation, and domain adaptation. It receives as input motion data recorded from an animal, and outputs a control policy that enables a real robot to reproduce the motion.
  • Figure 3: Inverse-kinematics (IK) is used to retarget mocap clips recorded from a real dog (left) to the Laikago robot (right). Corresponding pairs of keypoints (red) are specified on the dog and robot's bodies, and then IK is used to compute a pose for the robot that tracks the keypoints.
  • Figure 4: Laikago robot performing skills learned by imitating reference motions. Top: Reference motion. Middle: Simulated robot. Bottom: Real robot.
  • Figure 5: Performance statistics of imitating various skills in the real world. Performance is recorded as the average normalized return between [0, 1]. Three policies initialized with different random seeds are trained for each combination of skill and method. The performance of each policy is evaluated over 5 episodes, for a total of 15 trials per method. The adaptive policies outperform the non-adaptive policies on most skills.
  • ...and 6 more figures