Table of Contents
Fetching ...

CooT: Learning to Coordinate In-Context with Coordination Transformers

Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun

TL;DR

CooT addresses the challenge of coordinating with unseen partners in dynamic multi-agent environments by leveraging in-context learning through Coordination Transformers. It frames coordination as a Hidden-Utility Markov Game and trains a transformer to predict partner-aligned best-response actions using past interaction histories, with online deployment achieved without gradient updates. The approach is validated in the Overcooked domain, where CooT outperforms population-based, gradient-based, and meta-RL baselines, and is consistently ranked highly in human evaluations. Key contributions include the HU-MG formulation, a diverse dataset generation pipeline, and extensive analyses of adaptation dynamics and robustness to non-stationary partners, highlighting practical potential for rapid, human-friendly collaboration in real-world settings.

Abstract

Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require impractically extensive fine-tuning. To overcome these limitations, we propose Coordination Transformers (\coot), a novel in-context coordination framework that uses recent interaction histories to rapidly adapt to unseen partners. Unlike prior approaches that primarily aim to diversify training partners, \coot explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed interactions. Trained on trajectories collected from diverse pairs of agents with complementary preferences, \coot quickly learns effective coordination strategies without explicit supervision or parameter updates. Across diverse coordination tasks in Overcooked, \coot consistently outperforms baselines including population-based approaches, gradient-based fine-tuning, and a Meta-RL-inspired contextual adaptation method. Notably, fine-tuning proves unstable and ineffective, while Meta-RL struggles to achieve reliable coordination. By contrast, \coot achieves stable, rapid in-context adaptation and is consistently ranked the most effective collaborator in human evaluations.

CooT: Learning to Coordinate In-Context with Coordination Transformers

TL;DR

CooT addresses the challenge of coordinating with unseen partners in dynamic multi-agent environments by leveraging in-context learning through Coordination Transformers. It frames coordination as a Hidden-Utility Markov Game and trains a transformer to predict partner-aligned best-response actions using past interaction histories, with online deployment achieved without gradient updates. The approach is validated in the Overcooked domain, where CooT outperforms population-based, gradient-based, and meta-RL baselines, and is consistently ranked highly in human evaluations. Key contributions include the HU-MG formulation, a diverse dataset generation pipeline, and extensive analyses of adaptation dynamics and robustness to non-stationary partners, highlighting practical potential for rapid, human-friendly collaboration in real-world settings.

Abstract

Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require impractically extensive fine-tuning. To overcome these limitations, we propose Coordination Transformers (\coot), a novel in-context coordination framework that uses recent interaction histories to rapidly adapt to unseen partners. Unlike prior approaches that primarily aim to diversify training partners, \coot explicitly focuses on adapting to new partner behaviors by predicting actions aligned with observed interactions. Trained on trajectories collected from diverse pairs of agents with complementary preferences, \coot quickly learns effective coordination strategies without explicit supervision or parameter updates. Across diverse coordination tasks in Overcooked, \coot consistently outperforms baselines including population-based approaches, gradient-based fine-tuning, and a Meta-RL-inspired contextual adaptation method. Notably, fine-tuning proves unstable and ineffective, while Meta-RL struggles to achieve reliable coordination. By contrast, \coot achieves stable, rapid in-context adaptation and is consistently ranked the most effective collaborator in human evaluations.

Paper Structure

This paper contains 40 sections, 9 figures, 11 tables, 2 algorithms.

Figures (9)

  • Figure 1: CooT.(a) Training. We generate a dataset $\mathcal{D}$ of trajectories between behavior-preferring agents and their best-response (BR) policies. For each training instance, CooT receives query states $\mathbf{s}_h$ and context $\mathbf{C}$ of past interactions, and learns to predict an action $\mathbf{a}$ mimicking the BR action $\hat{\mathbf{a}}$. (b) Evaluation. At test time, CooT coordinates with unseen partners by continually updating its context from recent episodes, which adapts to the partner online without gradient updates, enabling few-shot generalization through context update alone.
  • Figure 2: Human study: agent ranking distribution. Number of participants who chose each agent at different rankings. CooT received the highest number of first-place rankings, indicating it is the most preferred collaborator.
  • Figure 3: In-context performance improvement of CooT over episodes. As more partner trajectories are observed, CooT steadily improves its coordination strategy, highlighting the advantage of context-based adaptation.
  • Figure 4: Used layouts in Overcooked.
  • Figure 5: CooT performance across layouts. Learning curves on six evaluation layouts: Coord. Ring, Coord. Ring Multi-recipe, Counter Circ., Bothway Coord., Asymm Adv., and the aggregate result across all layouts (Overall).
  • ...and 4 more figures