Playing Large Games with Oracles and AI Debate

Xinyi Chen; Angelica Chen; Dean Foster; Elad Hazan

Playing Large Games with Oracles and AI Debate

Xinyi Chen, Angelica Chen, Dean Foster, Elad Hazan

TL;DR

The paper tackles regret minimization for language-based, two-player repeated games with extremely large action spaces by introducing an oracle-based framework. It develops a novel algorithm for simultaneously minimizing external and internal regret, achieving $O(\,\sqrt{T \ln N})$ regret and poly$(T)$ per-round time by leveraging sparse convex combinations and a fixed-point computation, with additional improvements in structured, small-support settings. The framework relies on smooth optimization oracles to enable efficient learning, and its effectiveness is demonstrated through experiments in the AI Safety via Debate setting, showing improved debate outcomes when using smooth/noisy feedback. Overall, the work provides both theoretical guarantees and empirical evidence that smooth, oracle-based regret minimization can scale to language-like action spaces and informs practical design choices for AI debate and alignment tasks.

Abstract

We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI Safety via Debate \cite{irving2018ai}, and more generally games whose actions are language-based. Existing algorithms for online game playing require per-iteration computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.

Playing Large Games with Oracles and AI Debate

TL;DR

regret and poly

per-round time by leveraging sparse convex combinations and a fixed-point computation, with additional improvements in structured, small-support settings. The framework relies on smooth optimization oracles to enable efficient learning, and its effectiveness is demonstrated through experiments in the AI Safety via Debate setting, showing improved debate outcomes when using smooth/noisy feedback. Overall, the work provides both theoretical guarantees and empirical evidence that smooth, oracle-based regret minimization can scale to language-like action spaces and informs practical design choices for AI debate and alignment tasks.

Abstract

Paper Structure (44 sections, 9 theorems, 40 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 44 sections, 9 theorems, 40 equations, 2 figures, 1 table, 3 algorithms.

Introduction
Our results
Related work
Learning in large games.
Solution concepts in game theory and notions of equilibria.
AI Debate.
Preliminaries
Notation.
Formalizing the repeated game
Solution concepts.
Regret minimization in games.
Oracle models
Algorithms and guarantees
External regret minimization
Simultaneous internal and external regret minimization
...and 29 more sections

Key Result

Corollary 3

Follow-the-Perturbed-Leader (Algorithm algo:external_regret) calls $\mathbb O^{\hbox{smooth}}$ once per time step. If we set $\eta = \sqrt{\frac{\ln N}{T}}$ and $\mathcal{D}$ to be the exponential distribution: $\mathcal{D}(x) \sim e^{-\eta x}$ , it produces pure strategies $x_1, \ldots, x_T$ that s

Figures (2)

Figure 1: The experimental set-up for our debate experiments. The debaters each have access to the text passage (the book icon) corresponding to a question from the QuALITY pang2021quality dataset and must convince the judge of their respective answers.
Figure 2: We measure the percentage of the time that the judge chooses the correct/incorrect answer or does not answer at the end of the debate (Fig. \ref{['fig:acc']}), as well as the probabilities that the judge assigns to each answer over the course of the debate (Fig. \ref{['fig:judge-prob']}). The '*' symbol indicates statistical significance when compared to the control in a one-tailed proportion test. When the debaters use the Combined strategy, the judge is statistically significantly more likely ($p=0.045$) to choose the correct answer than to answer incorrectly or abstain from responding.

Theorems & Definitions (16)

Definition 1: Pairwise modifications
Definition 2: $\Phi$-regret
Corollary 3: of Theorem 1.1 in kalai2005efficient
Lemma 4
Theorem 5
Corollary 6
Corollary 7
Corollary 8: of Theorem 1.1 in kalai2005efficient
proof : Proof of Lemma \ref{['lem:efficient_fixed_point']}
proof : Proof of Theorem \ref{['thm:main']}
...and 6 more

Playing Large Games with Oracles and AI Debate

TL;DR

Abstract

Playing Large Games with Oracles and AI Debate

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (16)