Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

Xingyu Liu; Deepak Pathak; Ding Zhao

Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

Xingyu Liu, Deepak Pathak, Ding Zhao

TL;DR

This paper tackles scalable policy transfer from a single source robot to multiple targets by proposing Meta-Evolve, which uses continuous robot evolution organized as an evolution tree with meta robots. By matching morphologies and interpolating parameters, it constructs shared training pathways and leverages a $p$-Steiner-tree framework to minimize total transfer cost in the evolution space. Across Hand Manipulation Suite and agile locomotion tasks, Meta-Evolve achieves significant reductions in both training and simulation budget compared to independent transfers, illustrating improved scalability when transferring policies to related robotic morphologies. The approach offers a principled, geometry-inspired method for cross-robot imitation learning with practical implications for deploying learned policies on a family of robots.

Abstract

We investigate the problem of transferring an expert policy from a source robot to multiple different robots. To solve this problem, we propose a method named $Meta$-$Evolve$ that uses continuous robot evolution to efficiently transfer the policy to each target robot through a set of tree-structured evolutionary robot sequences. The robot evolution tree allows the robot evolution paths to be shared, so our approach can significantly outperform naive one-to-one policy transfer. We present a heuristic approach to determine an optimized robot evolution tree. Experiments have shown that our method is able to improve the efficiency of one-to-three transfer of manipulation policy by up to 3.2$\times$ and one-to-six transfer of agile locomotion policy by 2.4$\times$ in terms of simulation cost over the baseline of launching multiple independent one-to-one policy transfers.

Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

TL;DR

-Steiner-tree framework to minimize total transfer cost in the evolution space. Across Hand Manipulation Suite and agile locomotion tasks, Meta-Evolve achieves significant reductions in both training and simulation budget compared to independent transfers, illustrating improved scalability when transferring policies to related robotic morphologies. The approach offers a principled, geometry-inspired method for cross-robot imitation learning with practical implications for deploying learned policies on a family of robots.

Abstract

We investigate the problem of transferring an expert policy from a source robot to multiple different robots. To solve this problem, we propose a method named

that uses continuous robot evolution to efficiently transfer the policy to each target robot through a set of tree-structured evolutionary robot sequences. The robot evolution tree allows the robot evolution paths to be shared, so our approach can significantly outperform naive one-to-one policy transfer. We present a heuristic approach to determine an optimized robot evolution tree. Experiments have shown that our method is able to improve the efficiency of one-to-three transfer of manipulation policy by up to 3.2

and one-to-six transfer of agile locomotion policy by 2.4

in terms of simulation cost over the baseline of launching multiple independent one-to-one policy transfers.

Paper Structure (22 sections, 9 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 9 equations, 9 figures, 4 tables, 2 algorithms.

Introduction
Preliminary
One-to-Many Robot-to-Robot Policy Transfer
Problem Statement
Multi-robot Morphology Matching and Intermediate Robot Generation
One-to-many Robot Evolution for Policy Transfer
Evolution Tree Determination
Discussions
Related Work
Experiments
One-to-three Manipulation Policy Transfer
One-to-six Agile Locomotion Policy Transfer
Conclusion
Additional Experiments on Real Commercial Robots
Additional Discussions
...and 7 more sections

Figures (9)

Figure 1: (a) REvolveR and HERDherdrevolver are methods for transferring policy between a pair of robots using continuous robot evolution. Therefore, to transfer a policy on the source robot to multiple target robots, they must launch multiple independent runs for each target robot. (b) Our Meta-Evolve uses continuous robot evolution to transfer an expert policy from the source robot to each target robot through an evolution tree defined by the connections of multiple "meta robots", i.e. tree-structured evolutionary robot sequences.
Figure 2: (a) Morphology matching of multiple robots. Colored circles denote corresponding robot bodies and straight lines denote robot joints. (b) An example of robot evolution parameter space after morphology matching of multiple robots. The four highlighted robots are the source and three target robots used in experiments in Section \ref{['sec:exp:1:to:3']} respectively. Other semi-transparent robots are the generated intermediate robots.
Figure 3: Hand Manipulation Suite (HMS) tasksdapg used in our experiments: (a) Hammer, (b) Door, and (c) Relocate. Robot Evolution paths of (d) using multiple independent HERD; (e) using geometric median as the only meta robot; and (f) our Meta-Evolve.
Figure 4: Agile locomotion task in a maze. (a) Environment and task setup; (b) Evolution paths when launching independent HERD runs; (c) Evolution paths when using $L^1$ Steiner tree as evolution tree.
Figure 5: The evolution tree from the source robot, i.e. (a) ADROIT five-finger hand, to three target real commercial robots: (b) Jaco robot with three-finger Jaco gripper, (c) Kinova3 robot with two-finger Robotiq-85 gripper, and (d) IIWA robot with two-finger Robotiq-140 gripper.
...and 4 more figures

Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

TL;DR

Abstract

Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

Authors

TL;DR

Abstract

Table of Contents

Figures (9)