CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport
Joachim Yann Despature, Kazuki Shibata, Takamitsu Matsubara
TL;DR
CoLF tackles vision–language-guided decentralized multi-robot transport by enforcing a consistent leader–follower role through an asymmetric policy design and a mutual-information-based objective. The follower is trained to predict the leader's action from local observations, promoting coordination under perceptual misalignment and language ambiguity within a CTDE MARL framework. Empirical results in simulation and on real quadrupeds show improved training performance, robustness to goal-language ambiguity, and effective sim-to-real transfer across varied landmarks. The approach enables decentralized, VLM-grounded cooperation without relying on global views or heavy inter-robot communication, with promising implications for scalable, language-guided multi-robot tasks.
Abstract
In this study, we address vision-language-guided multi-robot cooperative transport, where each robot grounds natural-language instructions from onboard camera observations. A key challenge in this decentralized setting is perceptual misalignment across robots, where viewpoint differences and language ambiguity can yield inconsistent interpretations and degrade cooperative transport. To mitigate this problem, we adopt a dependent leader-follower design, where one robot serves as the leader and the other as the follower. Although such a leader-follower structure appears straightforward, learning with independent and symmetric agents often yields symmetric or unstable behaviors without explicit inductive biases. To address this challenge, we propose Consistent Leader-Follower (CoLF), a multi-agent reinforcement learning (MARL) framework for stable leader-follower role differentiation. CoLF consists of two key components: (1) an asymmetric policy design that induces leader-follower role differentiation, and (2) a mutual-information-based training objective that maximizes a variational lower bound, encouraging the follower to predict the leader's action from its local observation. The leader and follower policies are jointly optimized under the centralized training and decentralized execution (CTDE) framework to balance task execution and consistent cooperative behaviors. We validate CoLF in both simulation and real-robot experiments using two quadruped robots. The demonstration video is available at https://sites.google.com/view/colf/.
