Playing Large Games with Oracles and AI Debate
Xinyi Chen, Angelica Chen, Dean Foster, Elad Hazan
TL;DR
The paper tackles regret minimization for language-based, two-player repeated games with extremely large action spaces by introducing an oracle-based framework. It develops a novel algorithm for simultaneously minimizing external and internal regret, achieving $O(\,\sqrt{T \ln N})$ regret and poly$(T)$ per-round time by leveraging sparse convex combinations and a fixed-point computation, with additional improvements in structured, small-support settings. The framework relies on smooth optimization oracles to enable efficient learning, and its effectiveness is demonstrated through experiments in the AI Safety via Debate setting, showing improved debate outcomes when using smooth/noisy feedback. Overall, the work provides both theoretical guarantees and empirical evidence that smooth, oracle-based regret minimization can scale to language-like action spaces and informs practical design choices for AI debate and alignment tasks.
Abstract
We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI Safety via Debate \cite{irving2018ai}, and more generally games whose actions are language-based. Existing algorithms for online game playing require per-iteration computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.
