Multi-Agent Behavior Retrieval: Retrieval-Augmented Policy Training for Cooperative Push Manipulation by Mobile Robots
So Kuroki, Mai Nishimura, Tadashi Kozuno
TL;DR
This work tackles data inefficiency in multi-agent coordination for cooperative push manipulation by introducing the Multi-Agent Coordination Skill Database (MACS-DB) and a Transformer-based skill encoder to capture spatio-temporal interactions. The retrieval-augmented policy training framework retrieves relevant past coordination skills from a large, task-agnostic prior dataset using a DTW-based similarity measure, and augments the target training set with these retrieved demonstrations. Empirical results in simulation and on real wheeled robots show improved success rates over baselines such as few-shot imitation learning and agent-wise trajectory matching, with ablations highlighting the benefits and limitations of the retrieval approach. The proposed method enables data-efficient learning of coordinated multi-agent policies and demonstrates practical applicability to real-world robotic teams, while also outlining avenues for improving collision handling and generalization to additional multi-agent tasks.
Abstract
Due to the complex interactions between agents, learning multi-agent control policy often requires a prohibited amount of data. This paper aims to enable multi-agent systems to effectively utilize past memories to adapt to novel collaborative tasks in a data-efficient fashion. We propose the Multi-Agent Coordination Skill Database, a repository for storing a collection of coordinated behaviors associated with key vectors distinctive to them. Our Transformer-based skill encoder effectively captures spatio-temporal interactions that contribute to coordination and provides a unique skill representation for each coordinated behavior. By leveraging only a small number of demonstrations of the target task, the database enables us to train the policy using a dataset augmented with the retrieved demonstrations. Experimental evaluations demonstrate that our method achieves a significantly higher success rate in push manipulation tasks compared with baseline methods like few-shot imitation learning. Furthermore, we validate the effectiveness of our retrieve-and-learn framework in a real environment using a team of wheeled robots.
