Table of Contents
Fetching ...

Playing Large Games with Oracles and AI Debate

Xinyi Chen, Angelica Chen, Dean Foster, Elad Hazan

TL;DR

The paper tackles regret minimization for language-based, two-player repeated games with extremely large action spaces by introducing an oracle-based framework. It develops a novel algorithm for simultaneously minimizing external and internal regret, achieving $O(\,\sqrt{T \ln N})$ regret and poly$(T)$ per-round time by leveraging sparse convex combinations and a fixed-point computation, with additional improvements in structured, small-support settings. The framework relies on smooth optimization oracles to enable efficient learning, and its effectiveness is demonstrated through experiments in the AI Safety via Debate setting, showing improved debate outcomes when using smooth/noisy feedback. Overall, the work provides both theoretical guarantees and empirical evidence that smooth, oracle-based regret minimization can scale to language-like action spaces and informs practical design choices for AI debate and alignment tasks.

Abstract

We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI Safety via Debate \cite{irving2018ai}, and more generally games whose actions are language-based. Existing algorithms for online game playing require per-iteration computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.

Playing Large Games with Oracles and AI Debate

TL;DR

The paper tackles regret minimization for language-based, two-player repeated games with extremely large action spaces by introducing an oracle-based framework. It develops a novel algorithm for simultaneously minimizing external and internal regret, achieving regret and poly per-round time by leveraging sparse convex combinations and a fixed-point computation, with additional improvements in structured, small-support settings. The framework relies on smooth optimization oracles to enable efficient learning, and its effectiveness is demonstrated through experiments in the AI Safety via Debate setting, showing improved debate outcomes when using smooth/noisy feedback. Overall, the work provides both theoretical guarantees and empirical evidence that smooth, oracle-based regret minimization can scale to language-like action spaces and informs practical design choices for AI debate and alignment tasks.

Abstract

We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI Safety via Debate \cite{irving2018ai}, and more generally games whose actions are language-based. Existing algorithms for online game playing require per-iteration computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.
Paper Structure (44 sections, 9 theorems, 40 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 44 sections, 9 theorems, 40 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Corollary 3

Follow-the-Perturbed-Leader (Algorithm algo:external_regret) calls $\mathbb O^{\hbox{smooth}}$ once per time step. If we set $\eta = \sqrt{\frac{\ln N}{T}}$ and $\mathcal{D}$ to be the exponential distribution: $\mathcal{D}(x) \sim e^{-\eta x}$ , it produces pure strategies $x_1, \ldots, x_T$ that s

Figures (2)

  • Figure 1: The experimental set-up for our debate experiments. The debaters each have access to the text passage (the book icon) corresponding to a question from the QuALITY pang2021quality dataset and must convince the judge of their respective answers.
  • Figure 2: We measure the percentage of the time that the judge chooses the correct/incorrect answer or does not answer at the end of the debate (Fig. \ref{['fig:acc']}), as well as the probabilities that the judge assigns to each answer over the course of the debate (Fig. \ref{['fig:judge-prob']}). The '*' symbol indicates statistical significance when compared to the control in a one-tailed proportion test. When the debaters use the Combined strategy, the judge is statistically significantly more likely ($p=0.045$) to choose the correct answer than to answer incorrectly or abstain from responding.

Theorems & Definitions (16)

  • Definition 1: Pairwise modifications
  • Definition 2: $\Phi$-regret
  • Corollary 3: of Theorem 1.1 in kalai2005efficient
  • Lemma 4
  • Theorem 5
  • Corollary 6
  • Corollary 7
  • Corollary 8: of Theorem 1.1 in kalai2005efficient
  • proof : Proof of Lemma \ref{['lem:efficient_fixed_point']}
  • proof : Proof of Theorem \ref{['thm:main']}
  • ...and 6 more