Swap Regret and Correlated Equilibria Beyond Normal-Form Games
Eshwar Ram Arunachaleswaran, Natalie Collina, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan
TL;DR
This paper extends regret minimization beyond normal-form games by introducing profile swap regret for polytope games, showing sublinear regret is necessary and sufficient for non-manipulability against a strategic opponent. It develops efficient learning algorithms via Blackwell approachability to achieve $O(\sqrt{T})$ profile swap regret, and analyzes the resulting equilibrium notions, revealing a gap between mediator-implementable correlated outcomes and profile CE in general polytope games. It further distinguishes game-aware from game-agnostic learning, connects to previous work on swap and $\Phi$-regret, and proves APX-hardness for computing the profile-swap distance in Bayesian settings, while offering semi-separation-based methods for practical efficient learning. The work advances understanding of how to compute and reason about correlated outcomes in structured, non-normal-form games, with implications for mechanism design and decentralized equilibrium computation.
Abstract
Swap regret is a notion that has proven itself to be central to the study of general-sum normal-form games, with swap-regret minimization leading to convergence to the set of correlated equilibria and guaranteeing non-manipulability against a self-interested opponent. However, the situation for more general classes of games -- such as Bayesian games and extensive-form games -- is less clear-cut, with multiple candidate definitions for swap-regret but no known efficiently minimizable variant of swap regret that implies analogous non-manipulability guarantees. In this paper, we present a new variant of swap regret for polytope games that we call ``profile swap regret'', with the property that obtaining sublinear profile swap regret is both necessary and sufficient for any learning algorithm to be non-manipulable by an opponent (resolving an open problem of Mansour et al., 2022). Although we show profile swap regret is NP-hard to compute given a transcript of play, we show it is nonetheless possible to design efficient learning algorithms that guarantee at most $O(\sqrt{T})$ profile swap regret. Finally, we explore the correlated equilibrium notion induced by low-profile-swap-regret play, and demonstrate a gap between the set of outcomes that can be implemented by this learning process and the set of outcomes that can be implemented by a third-party mediator (in contrast to the situation in normal-form games).
