MetaLoco: Universal Quadrupedal Locomotion with Meta-Reinforcement Learning and Motion Imitation

Fatemeh Zargarbashi; Fabrizio Di Giuro; Jin Cheng; Dongho Kang; Bhavya Sukhija; Stelian Coros

MetaLoco: Universal Quadrupedal Locomotion with Meta-Reinforcement Learning and Motion Imitation

Fatemeh Zargarbashi, Fabrizio Di Giuro, Jin Cheng, Dongho Kang, Bhavya Sukhija, Stelian Coros

TL;DR

Abstract

This work presents a meta-reinforcement learning approach to develop a universal locomotion control policy capable of zero-shot generalization across diverse quadrupedal platforms. The proposed method trains an RL agent equipped with a memory unit to imitate reference motions using a small set of procedurally generated quadruped robots. Through comprehensive simulation and real-world hardware experiments, we demonstrate the efficacy of our approach in achieving locomotion across various robots without requiring robot-specific fine-tuning. Furthermore, we highlight the critical role of the memory unit in enabling generalization, facilitating rapid adaptation to changes in the robot properties, and improving sample efficiency.

MetaLoco: Universal Quadrupedal Locomotion with Meta-Reinforcement Learning and Motion Imitation

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 10 figures, 5 tables)

This paper contains 17 sections, 6 equations, 10 figures, 5 tables.

Introduction
Related Work
Reinforcement Learning for Legged Locomotion
Universal Locomotion Control
Preliminaries
Method
Overview
Meta-RL Setup
Implementation Details
Morphology Generation
Simulation results
Generalization over diverse embodiments
Comparison of Architectures
Number of Training Robots
Meta-episode Length K
...and 2 more sections

Figures (10)

Figure 1: Morphological template of most commercial quadrupedal robots. From left to right: Boston Dynamics Spotspot, Unitree Aliengoaliengo, Unitree Go2go2, Unitree Go1go1, Unitree B2b2.
Figure 2: Overview of our framework. The objective is to learn a universal policy that given joystick commands, maps the robot’s state ($o_t$) to target joint positions ($a_t = q_t^*$) for various quadruped designs. The policy is trained to maximise a reward that encourages tracking a reference motion, produced by a kinematic reference generator.
Figure 3: Three simulated quadrupeds successfully trotting with our universal policy (left) and failing with a policy specifically trained for Go1 (right).
Figure 4: Visualization of the morphology structure of Unitree Go1 (left), and one generated with our method (right). In the ovals, we highlight the parameters of the thigh link of the hind-left leg.
Figure 5: Example of quadrupeds procedurally generated by randomizing the kinematic and dynamic parameters of Unitree Go1 and Unitree Aliengo.
...and 5 more figures

MetaLoco: Universal Quadrupedal Locomotion with Meta-Reinforcement Learning and Motion Imitation

TL;DR

Abstract

MetaLoco: Universal Quadrupedal Locomotion with Meta-Reinforcement Learning and Motion Imitation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)