A Minimax Approach to Ad Hoc Teamwork
Victor Villin, Thomas Kleine Buening, Christos Dimitrakakis
TL;DR
This work tackles Ad Hoc Teamwork under partner uncertainty by reframing AHT as a Minimax-Bayes Reinforcement Learning problem over a finite background population of partner policies. By optimizing a focal policy against the worst-case prior over training scenarios, the approach yields strong worst-case guarantees and improved out-of-distribution robustness, demonstrated on tasks like Collaborative Cooking and Iterated Prisoner's Dilemma. The authors compare utility- and regret-based objective formulations, introduce a Gradient Descent-Ascent training algorithm for softmax policies, and show that minimax-distribution training can accelerate learning while improving robustness to unseen teammates. The findings highlight the critical role of the training-partner distribution in achieving robust AHT, with practical implications for curriculum-like scenario generation and robust coordination in multi-agent systems. The work advances robust AHT by providing theoretical guarantees, an actionable training methodology, and empirical evidence of improved performance across simple and deep RL coordination tasks.
Abstract
We propose a minimax-Bayes approach to Ad Hoc Teamwork (AHT) that optimizes policies against an adversarial prior over partners, explicitly accounting for uncertainty about partners at time of deployment. Unlike existing methods that assume a specific distribution over partners, our approach improves worst-case performance guarantees. Extensive experiments, including evaluations on coordinated cooking tasks from the Melting Pot suite, show our method's superior robustness compared to self-play, fictitious play, and best response learning. Our work highlights the importance of selecting an appropriate training distribution over teammates to achieve robustness in AHT.
