Learning Agile Robotic Locomotion Skills by Imitating Animals
Xue Bin Peng, Erwin Coumans, Tingnan Zhang, Tsang-Wei Lee, Jie Tan, Sergey Levine
TL;DR
This work introduces an imitation-learning framework that enables legged robots to acquire agile locomotion by imitating real animal motions. It combines (i) motion retargeting of animal mocap to robot morphology, (ii) motion imitation in simulation with a pose-goal reward structure, and (iii) sample-efficient domain adaptation using latent dynamics encoding and an information bottleneck to bridge sim-to-real transfer. The approach yields a diverse set of skills on an 18-DoF Laikago quadruped and demonstrates effective real-world transfer with a limited number of trials, highlighting the benefits of latent-domain adaptation over traditional domain randomization alone. The findings suggest that leveraging animal motion data and latent dynamics can significantly reduce reward-design effort while enabling broader, more agile behaviors in legged robots.
Abstract
Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.
