Table of Contents
Fetching ...

Symmetry-Breaking Augmentations for Ad Hoc Teamwork

Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid

TL;DR

This work introduces symmetry-breaking augmentations (SBA) to address ad hoc teamwork (AHT) by exposing training agents to symmetry-equivalent and symmetry-breaking conventions through equivalence mappings. SBA systematically augments the training population with symmetry-transformed partners, quantified by Augmentation Impact (AugImp), to improve robustness against unseen teammate strategies. In both a simple iterated lever game and the cooperative card game Hanabi, SBA achieves state-of-the-art AHT performance and enhances generalization to diverse partner populations, while revealing how conventions influence alignment with humans. Limitations include dependence on identifiable environmental symmetries and the potential for adverse effects when test-time partners rely heavily on information channels that SBA reduces; future work proposes automatic symmetry detection and integration with other AHT improvements across broader Dec-POMDPs.

Abstract

In dynamic collaborative settings, for artificial intelligence (AI) agents to better align with humans, they must adapt to novel teammates who utilise unforeseen strategies. While adaptation is often simple for humans, it can be challenging for AI agents. Our work introduces symmetry-breaking augmentations (SBA) as a novel approach to this challenge. By applying a symmetry-flipping operation to increase behavioural diversity among training teammates, SBA encourages agents to learn robust responses to unknown strategies, highlighting how social conventions impact human-AI alignment. We demonstrate this experimentally in two settings, showing that our approach outperforms previous ad hoc teamwork results in the challenging card game Hanabi. In addition, we propose a general metric for estimating symmetry dependency amongst a given set of policies. Our findings provide insights into how AI systems can better adapt to diverse human conventions and the core mechanics of alignment.

Symmetry-Breaking Augmentations for Ad Hoc Teamwork

TL;DR

This work introduces symmetry-breaking augmentations (SBA) to address ad hoc teamwork (AHT) by exposing training agents to symmetry-equivalent and symmetry-breaking conventions through equivalence mappings. SBA systematically augments the training population with symmetry-transformed partners, quantified by Augmentation Impact (AugImp), to improve robustness against unseen teammate strategies. In both a simple iterated lever game and the cooperative card game Hanabi, SBA achieves state-of-the-art AHT performance and enhances generalization to diverse partner populations, while revealing how conventions influence alignment with humans. Limitations include dependence on identifiable environmental symmetries and the potential for adverse effects when test-time partners rely heavily on information channels that SBA reduces; future work proposes automatic symmetry detection and integration with other AHT improvements across broader Dec-POMDPs.

Abstract

In dynamic collaborative settings, for artificial intelligence (AI) agents to better align with humans, they must adapt to novel teammates who utilise unforeseen strategies. While adaptation is often simple for humans, it can be challenging for AI agents. Our work introduces symmetry-breaking augmentations (SBA) as a novel approach to this challenge. By applying a symmetry-flipping operation to increase behavioural diversity among training teammates, SBA encourages agents to learn robust responses to unknown strategies, highlighting how social conventions impact human-AI alignment. We demonstrate this experimentally in two settings, showing that our approach outperforms previous ad hoc teamwork results in the challenging card game Hanabi. In addition, we propose a general metric for estimating symmetry dependency amongst a given set of policies. Our findings provide insights into how AI systems can better adapt to diverse human conventions and the core mechanics of alignment.
Paper Structure (34 sections, 14 equations, 12 figures, 9 tables, 1 algorithm)

This paper contains 34 sections, 14 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: Augmenting conventions of other agents. The driver stops at red and drives on green (top), but with SBA, our agent sees the driver stopping and starting with many colours (bottom).
  • Figure 2: The $\phi$ operator converts green observations to red (left), and $\phi^{-1}$ inversely converts red actions back to green (right). In this game red and green are symmetrically-equivalent, so the application of $\phi$ and $\phi^{-1}$ leaves the game unchanged up to relabelling.
  • Figure 3: Symmetry-breaking augmentations for an $n$-player Dec-POMDP. The equivalence map $\phi$ is only applied to the observations and actions of our AHT agent $\pi_A$, not the teammate policy $\pi_j$.
  • Figure 4: In the iterated lever coordination game agents can see what actions were previously taken. The game highlights the difficulty of adapting to conventions not seen during training.
  • Figure 5: Training curves for the iterated lever coordination game. Shown is the mean, shading is the standard error of the mean, across $30$ different seeds. SBA improves test performance because it exposes the agent to more conventions during training.
  • ...and 7 more figures

Theorems & Definitions (2)

  • proof
  • proof