CooT: Learning to Coordinate In-Context with Coordination Transformers
Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun
TL;DR
CooT addresses the challenge of coordinating with unseen partners in dynamic multi-agent environments by leveraging in-context learning through Coordination Transformers. It frames coordination as a Hidden-Utility Markov Game and trains a transformer to predict partner-aligned best-response actions using past interaction histories, with online deployment achieved without gradient updates. The approach is validated in the Overcooked domain, where CooT outperforms population-based, gradient-based, and meta-RL baselines, and is consistently ranked highly in human evaluations. Key contributions include the HU-MG formulation, a diverse dataset generation pipeline, and extensive analyses of adaptation dynamics and robustness to non-stationary partners, highlighting practical potential for rapid, human-friendly collaboration in real-world settings.
Abstract
Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require impractically extensive fine-tuning. To overcome these limitations, we propose Coordination Transformers (\coot), a novel in-context coordination framework that uses recent interaction histories to rapidly adapt to unseen partners. Unlike prior approaches that primarily aim to diversify training partners, \coot explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed interactions. Trained on trajectories collected from diverse pairs of agents with complementary preferences, \coot quickly learns effective coordination strategies without explicit supervision or parameter updates. Across diverse coordination tasks in Overcooked, \coot consistently outperforms baselines including population-based approaches, gradient-based fine-tuning, and a Meta-RL-inspired contextual adaptation method. Notably, fine-tuning proves unstable and ineffective, while Meta-RL struggles to achieve reliable coordination. By contrast, \coot achieves stable, rapid in-context adaptation and is consistently ranked the most effective collaborator in human evaluations.
