Table of Contents
Fetching ...

Multi-Agent Behavior Retrieval: Retrieval-Augmented Policy Training for Cooperative Push Manipulation by Mobile Robots

So Kuroki, Mai Nishimura, Tadashi Kozuno

TL;DR

This work tackles data inefficiency in multi-agent coordination for cooperative push manipulation by introducing the Multi-Agent Coordination Skill Database (MACS-DB) and a Transformer-based skill encoder to capture spatio-temporal interactions. The retrieval-augmented policy training framework retrieves relevant past coordination skills from a large, task-agnostic prior dataset using a DTW-based similarity measure, and augments the target training set with these retrieved demonstrations. Empirical results in simulation and on real wheeled robots show improved success rates over baselines such as few-shot imitation learning and agent-wise trajectory matching, with ablations highlighting the benefits and limitations of the retrieval approach. The proposed method enables data-efficient learning of coordinated multi-agent policies and demonstrates practical applicability to real-world robotic teams, while also outlining avenues for improving collision handling and generalization to additional multi-agent tasks.

Abstract

Due to the complex interactions between agents, learning multi-agent control policy often requires a prohibited amount of data. This paper aims to enable multi-agent systems to effectively utilize past memories to adapt to novel collaborative tasks in a data-efficient fashion. We propose the Multi-Agent Coordination Skill Database, a repository for storing a collection of coordinated behaviors associated with key vectors distinctive to them. Our Transformer-based skill encoder effectively captures spatio-temporal interactions that contribute to coordination and provides a unique skill representation for each coordinated behavior. By leveraging only a small number of demonstrations of the target task, the database enables us to train the policy using a dataset augmented with the retrieved demonstrations. Experimental evaluations demonstrate that our method achieves a significantly higher success rate in push manipulation tasks compared with baseline methods like few-shot imitation learning. Furthermore, we validate the effectiveness of our retrieve-and-learn framework in a real environment using a team of wheeled robots.

Multi-Agent Behavior Retrieval: Retrieval-Augmented Policy Training for Cooperative Push Manipulation by Mobile Robots

TL;DR

This work tackles data inefficiency in multi-agent coordination for cooperative push manipulation by introducing the Multi-Agent Coordination Skill Database (MACS-DB) and a Transformer-based skill encoder to capture spatio-temporal interactions. The retrieval-augmented policy training framework retrieves relevant past coordination skills from a large, task-agnostic prior dataset using a DTW-based similarity measure, and augments the target training set with these retrieved demonstrations. Empirical results in simulation and on real wheeled robots show improved success rates over baselines such as few-shot imitation learning and agent-wise trajectory matching, with ablations highlighting the benefits and limitations of the retrieval approach. The proposed method enables data-efficient learning of coordinated multi-agent policies and demonstrates practical applicability to real-world robotic teams, while also outlining avenues for improving collision handling and generalization to additional multi-agent tasks.

Abstract

Due to the complex interactions between agents, learning multi-agent control policy often requires a prohibited amount of data. This paper aims to enable multi-agent systems to effectively utilize past memories to adapt to novel collaborative tasks in a data-efficient fashion. We propose the Multi-Agent Coordination Skill Database, a repository for storing a collection of coordinated behaviors associated with key vectors distinctive to them. Our Transformer-based skill encoder effectively captures spatio-temporal interactions that contribute to coordination and provides a unique skill representation for each coordinated behavior. By leveraging only a small number of demonstrations of the target task, the database enables us to train the policy using a dataset augmented with the retrieved demonstrations. Experimental evaluations demonstrate that our method achieves a significantly higher success rate in push manipulation tasks compared with baseline methods like few-shot imitation learning. Furthermore, we validate the effectiveness of our retrieve-and-learn framework in a real environment using a team of wheeled robots.
Paper Structure (36 sections, 6 equations, 7 figures, 2 tables)

This paper contains 36 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Given a collection of unlabeled demonstrations from multiple tasks, our aim is to construct a coordination skill database that enables us to retrieve past experiences for different scenarios or domains (e.g., real-world robots).
  • Figure 2: Our retrieve-and-learn framework consists of three primary components: (a) constructing the Multi-Agent Coordination Skill Database on the basis of the prior experiences, (b) retrieving demonstrations using a few target demonstrations as queries, and (c) learning the multi-agent control policy using the retrieved data and target data.
  • Figure 3: In the training phase, our model takes past trajectory sequences as input tokens and outputs the future trajectories i.e., actions of each input token. In the embedding phase, the model pools the multi-agent feature embeddings into a single representative vector.
  • Figure 4: (a) Visualization of five different trajectories (N=3): A, B, C, and E represent tasks of pushing a stick at a hard level, while D represents the task of pushing a block at a hard level. (b) Visualization of Multi-Agent Coordination representation. The left image highlights the representations of A, B, and C, while the right image highlights those of D and E. Each trajectory and its corresponding representation sequence are matched with both a unique color and an alphabet ID.
  • Figure 5: The retrieved data is visualized with the two queries sampled from $\mathcal{D}_{\text{target}}$. For each demonstration, we display the manipulated object, the task difficulty, and the policy that generated the trajectory.
  • ...and 2 more figures