Robot Fleet Learning via Policy Merging
Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake
TL;DR
The paper addresses fleet-level policy learning under limited bandwidth by proposing policy merging (PoMe) and introducing Fleet-Merge, an algorithm that aligns multiple recurrent policies to a common reference using soft permutation projections. It demonstrates that merging $N$ locally trained policies with non-iid data can yield a single, effective policy $ heta_{\mathrm{mrg}}$ without sharing training data, outperforming naive averaging and matching centralized training in many cases. The method is validated on 50 Meta-World tasks and a new Drake-based FLEET-TOOLS benchmark, showing strong test-time performance, flat mode connectivity, and robustness to decentralization. This work advances scalable, data-efficient fleet learning for robotics by enabling diverse skill consolidation with minimal communication and without centralized data collection.
Abstract
Fleets of robots ingest massive amounts of heterogeneous streaming data silos generated by interacting with their environments, far more than what can be stored or transmitted with ease. At the same time, teams of robots should co-acquire diverse skills through their heterogeneous experiences in varied settings. How can we enable such fleet-level learning without having to transmit or centralize fleet-scale data? In this paper, we investigate policy merging (PoMe) from such distributed heterogeneous datasets as a potential solution. To efficiently merge policies in the fleet setting, we propose FLEET-MERGE, an instantiation of distributed learning that accounts for the permutation invariance that arises when parameterizing the control policies with recurrent neural networks. We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with good performance on nearly all training tasks at test time. Moreover, we introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks, to validate the efficacy of FLEET-MERGE on the benchmark.
