Table of Contents
Fetching ...

MetaLoco: Universal Quadrupedal Locomotion with Meta-Reinforcement Learning and Motion Imitation

Fatemeh Zargarbashi, Fabrizio Di Giuro, Jin Cheng, Dongho Kang, Bhavya Sukhija, Stelian Coros

TL;DR

This work presents a meta-reinforcement learning approach to develop a universal locomotion control policy capable of zero-shot generalization across diverse quadrupedal platforms and highlights the critical role of the memory unit in enabling generalization.

Abstract

This work presents a meta-reinforcement learning approach to develop a universal locomotion control policy capable of zero-shot generalization across diverse quadrupedal platforms. The proposed method trains an RL agent equipped with a memory unit to imitate reference motions using a small set of procedurally generated quadruped robots. Through comprehensive simulation and real-world hardware experiments, we demonstrate the efficacy of our approach in achieving locomotion across various robots without requiring robot-specific fine-tuning. Furthermore, we highlight the critical role of the memory unit in enabling generalization, facilitating rapid adaptation to changes in the robot properties, and improving sample efficiency.

MetaLoco: Universal Quadrupedal Locomotion with Meta-Reinforcement Learning and Motion Imitation

TL;DR

This work presents a meta-reinforcement learning approach to develop a universal locomotion control policy capable of zero-shot generalization across diverse quadrupedal platforms and highlights the critical role of the memory unit in enabling generalization.

Abstract

This work presents a meta-reinforcement learning approach to develop a universal locomotion control policy capable of zero-shot generalization across diverse quadrupedal platforms. The proposed method trains an RL agent equipped with a memory unit to imitate reference motions using a small set of procedurally generated quadruped robots. Through comprehensive simulation and real-world hardware experiments, we demonstrate the efficacy of our approach in achieving locomotion across various robots without requiring robot-specific fine-tuning. Furthermore, we highlight the critical role of the memory unit in enabling generalization, facilitating rapid adaptation to changes in the robot properties, and improving sample efficiency.
Paper Structure (17 sections, 6 equations, 10 figures, 5 tables)

This paper contains 17 sections, 6 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Morphological template of most commercial quadrupedal robots. From left to right: Boston Dynamics Spotspot, Unitree Aliengoaliengo, Unitree Go2go2, Unitree Go1go1, Unitree B2b2.
  • Figure 2: Overview of our framework. The objective is to learn a universal policy that given joystick commands, maps the robot’s state ($o_t$) to target joint positions ($a_t = q_t^*$) for various quadruped designs. The policy is trained to maximise a reward that encourages tracking a reference motion, produced by a kinematic reference generator.
  • Figure 3: Three simulated quadrupeds successfully trotting with our universal policy (left) and failing with a policy specifically trained for Go1 (right).
  • Figure 4: Visualization of the morphology structure of Unitree Go1 (left), and one generated with our method (right). In the ovals, we highlight the parameters of the thigh link of the hind-left leg.
  • Figure 5: Example of quadrupeds procedurally generated by randomizing the kinematic and dynamic parameters of Unitree Go1 and Unitree Aliengo.
  • ...and 5 more figures