From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces
Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich
TL;DR
The paper introduces a novel reduction from swap regret to external regret that does not require a finite action space, enabling no-swap-regret guarantees for broad hypothesis classes. The core construction, TreeSwap, uses a depth-$d$, $M$-ary tree of external-regret learners to bound swap regret by at most $\epsilon + 1/d$ after $T=M^d$ rounds, with per-round cost matching the external-regret oracle. This framework yields near-optimal upper and lower bounds for swap regret in the experts setting and extends to infinite and bandit settings, with tight lower bounds against oblivious and adaptive adversaries. The reduction has broad implications for equilibrium computation, providing efficient query/communication protocols for approximate CE/CCE in normal-form and extensive-form games, and it also yields near-tight bandit swap-regret algorithms. The work also clarifies the role of finite Littlestone dimension and related complexity notions in guaranteeing no-swap-regret learning and the existence of approximate correlated equilibria in broader settings.
Abstract
We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the space of actions. We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. For the problem of learning with expert advice, our result implies that it is possible to guarantee that the swap regret is bounded by ε after $\log(N)^{O(1/ε)}$ rounds and with $O(N)$ per iteration complexity, where $N$ is the number of experts, while the classical reductions of Blum-Mansour and Stolz-Lugosi require $O(N/ε^2)$ rounds and at least $Ω(N^2)$ per iteration complexity. Our result comes with an associated lower bound, which -- in contrast to that in [BM07] -- holds for oblivious and $\ell_1$-constrained adversaries and learners that can employ distributions over experts, showing that the number of rounds must be $\tildeΩ(N/ε^2)$ or exponential in $1/ε$. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria, of arbitrarily good approximation. This strengthens the folklore implication of no-regret learning that approximate coarse correlated equilibria exist. Importantly, it provides a sufficient condition for the existence of correlated equilibrium which vastly extends the requirement that the action set is finite, thus answering a question left open by [DG22; Ass+23]. Moreover, it answers several outstanding questions about equilibrium computation and learning in games.
