CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport

Joachim Yann Despature; Kazuki Shibata; Takamitsu Matsubara

CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport

Joachim Yann Despature, Kazuki Shibata, Takamitsu Matsubara

TL;DR

CoLF tackles vision–language-guided decentralized multi-robot transport by enforcing a consistent leader–follower role through an asymmetric policy design and a mutual-information-based objective. The follower is trained to predict the leader's action from local observations, promoting coordination under perceptual misalignment and language ambiguity within a CTDE MARL framework. Empirical results in simulation and on real quadrupeds show improved training performance, robustness to goal-language ambiguity, and effective sim-to-real transfer across varied landmarks. The approach enables decentralized, VLM-grounded cooperation without relying on global views or heavy inter-robot communication, with promising implications for scalable, language-guided multi-robot tasks.

Abstract

In this study, we address vision-language-guided multi-robot cooperative transport, where each robot grounds natural-language instructions from onboard camera observations. A key challenge in this decentralized setting is perceptual misalignment across robots, where viewpoint differences and language ambiguity can yield inconsistent interpretations and degrade cooperative transport. To mitigate this problem, we adopt a dependent leader-follower design, where one robot serves as the leader and the other as the follower. Although such a leader-follower structure appears straightforward, learning with independent and symmetric agents often yields symmetric or unstable behaviors without explicit inductive biases. To address this challenge, we propose Consistent Leader-Follower (CoLF), a multi-agent reinforcement learning (MARL) framework for stable leader-follower role differentiation. CoLF consists of two key components: (1) an asymmetric policy design that induces leader-follower role differentiation, and (2) a mutual-information-based training objective that maximizes a variational lower bound, encouraging the follower to predict the leader's action from its local observation. The leader and follower policies are jointly optimized under the centralized training and decentralized execution (CTDE) framework to balance task execution and consistent cooperative behaviors. We validate CoLF in both simulation and real-robot experiments using two quadruped robots. The demonstration video is available at https://sites.google.com/view/colf/.

CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport

TL;DR

Abstract

Paper Structure (36 sections, 5 equations, 5 figures, 3 tables)

This paper contains 36 sections, 5 equations, 5 figures, 3 tables.

Introdution
Related work
Model-based Multi-robot Cooperative Transport
MARL-based Multi-robot Cooperative Transport
VLM-based Multi-robot Control
Preliminary
Vision-language-guided Multi-robot Cooperative Transport
Decentralized Partially Observable Markov Decision Process (Dec-POMDP)
Multi-Agent Proximal Policy Optimization (MAPPO)
Method
Overview of the Proposed Framework
Policy Model
Training Objectives
Mutual-Information-Based Objective
Policy Optimization
...and 21 more sections

Figures (5)

Figure 1: Overview of vision--language-guided multi-robot cooperative transport. Two robots cooperatively transport the target object toward a language-specified goal landmark using onboard RGB-D observations.
Figure 2: Overview of the CoLF framework for vision-language-guided multi-robot cooperative transport.
Figure 3: Comparison of the object--goal proximity reward
Figure 4: Comparisons of trajectories in two multi-robot cooperative transport scenarios. For CoLF w/o CE and CoLF, Robot 1 and Robot 2 correspond to the Leader and the Follower agents, respectively.
Figure 5: Overview of the five real-world scenarios.

CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport

TL;DR

Abstract

CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport

Authors

TL;DR

Abstract

Table of Contents

Figures (5)