Enabling Multi-Robot Collaboration from Single-Human Guidance

Zhengran Ji; Lingyu Zhang; Paul Sajda; Boyuan Chen

Enabling Multi-Robot Collaboration from Single-Human Guidance

Zhengran Ji, Lingyu Zhang, Paul Sajda, Boyuan Chen

TL;DR

This paper tackles the problem of enabling collaboration among multiple robots under the constraint of guidance from a single human. It introduces a framework where a human can dynamically switch control among seekers and where ToM-inspired policy embeddings ground teammate behavior, enabling efficient learning from limited demonstrations. The approach combines data-driven imitation learning on a heuristic baseline with targeted fine-tuning using limited human interventions, and it explores several policy-embedding variants, finding that IL-Long paired with PE-T yields the strongest collaboration. Empirically, the method achieves up to 58% improvement over baselines in simulation and transfers effectively to real-world multi-robot systems, demonstrating practical impact for dynamic, partially observable multi-agent tasks.

Abstract

Learning collaborative behaviors is essential for multi-agent systems. Traditionally, multi-agent reinforcement learning solves this implicitly through a joint reward and centralized observations, assuming collaborative behavior will emerge. Other studies propose to learn from demonstrations of a group of collaborative experts. Instead, we propose an efficient and explicit way of learning collaborative behaviors in multi-agent systems by leveraging expertise from only a single human. Our insight is that humans can naturally take on various roles in a team. We show that agents can effectively learn to collaborate by allowing a human operator to dynamically switch between controlling agents for a short period and incorporating a human-like theory-of-mind model of teammates. Our experiments showed that our method improves the success rate of a challenging collaborative hide-and-seek task by up to 58% with only 40 minutes of human guidance. We further demonstrate our findings transfer to the real world by conducting multi-robot experiments.

Enabling Multi-Robot Collaboration from Single-Human Guidance

TL;DR

Abstract

Paper Structure (15 sections, 1 equation, 6 figures, 3 tables)

This paper contains 15 sections, 1 equation, 6 figures, 3 tables.

Introduction
Related Work
Multi-Agent Reinforcement Learning (MARL)
Multi-Agent Imitation Learning (MAIL)
Theory of Mind Inspired Machine Learning
Method
Task Settings
Single-Human Guidance
Data Collection
Grounding Human Guidance to Enable Collaborations
Experiments
Simulation Experiment
Real-World Experiment
Analysis
Conclusion, Limitation, and Future Work

Figures (6)

Figure 1: Our framework enables multi-robot collaboration in dynamic multi-agent hide-and-seek tasks from single-human guidance. The best policy achieves an average seeker success rate of 84.2% in simulation and 80% in real-world experiments in a challenging 3 seekers vs 3 hiders setting with random map layouts. In comparison, the baseline policy has only 36.4% in simulation and 55% in real-world. Interesting collaborative behaviors among seekers are observed during deployment, such as strategically navigating to anticipate and intercept hiders or effectively blocking key paths as a team.
Figure 2: Our framework uses a single human to guide multiple agents in learning collaborative behaviors. (A) Data Collection: We use a predefined heuristic policy to collect a dataset $D_\text{heuristic}$. (B) Imitation Learning (IL): We pre-train the encoder by predicting the whole team's actions with a single agent's observations and only update the MLP during IL. The IL policy exhibits no collaboration. (C) Fine-tuning: We collect single-human intervention data $D_\text{human}$ and fine-tune the policy with a frozen encoder. The fine-tuned policy demonstrates effective collaboration during testing.
Figure 3: Heuristic policy. (A-C) Seeker chases hider, avoids obstacles, and avoids walls. (D-H) Hider runs away from multiple seekers.
Figure 4: (A) IL behavior visualization. (B) IL-Long behavior visualization.
Figure 5: Policy Embedding. (A) PE-N explicitly uses the predicted teammate actions as inputs for policy training. (B) PE-H combines the representation of teammate action prediction and self-action prediction for policy training. (C) PE-T learns effective representations by training one network to predict the actions of the whole team.
...and 1 more figures

Enabling Multi-Robot Collaboration from Single-Human Guidance

TL;DR

Abstract

Enabling Multi-Robot Collaboration from Single-Human Guidance

Authors

TL;DR

Abstract

Table of Contents

Figures (6)