ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork
Caroline Wang, Arrasy Rahman, Jiaxun Cui, Yoonchang Sung, Peter Stone
TL;DR
This work reframes Ad Hoc Teamwork as an open-ended learning problem and introduces ROTATE, a regret-driven algorithm that alternates between ego-agent improvement and generating diverse teammates to probe its weaknesses. By optimizing a per-state cooperative regret and maintaining a population of past teammates, ROTATE mitigates self-sabotage and enhances generalization to unseen partners across two-player matrix games and popular coordination tasks. Empirical results show ROTATE outperforms diverse baselines, with per-state regret and population-buffer strategies central to its success. The approach offers a practical pathway to robust, zero-shot coordination in cooperative multi-agent systems, while acknowledging limitations related to scaling beyond two agents and extending theoretical analyses of regret objectives.
Abstract
Learning to collaborate with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches often adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammates with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the set of training teammates. This paper presents a unified framework for AHT by reformulating the problem as an open-ended learning process between an AHT agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Experiments across diverse two-player environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.
