Table of Contents
Fetching ...

The Hidden Game Problem

Gon Buzaglo, Noah Golowich, Elad Hazan

TL;DR

The paper addresses learning in enormous two-player games with a sparse, consistently dominant hidden set of actions $R$ in which payoffs satisfy $A = A_0 + \rho A_1$ and $A_0(i,j)=1$ for $i\in R$. It introduces the hidden game problem and shows how to design regret-minimization algorithms that keep external regret low in any game while achieving sublinear swap regret when the hidden structure exists. The main technical contributions are a swap-regret minimization scheme that incrementally uncovers $R$ with a bound of $O(\sqrt{T r^3 \log r})$ and a simultaneous external+swap regret framework (via Hedge, Follow-the-Perturbed-Leader with a smooth oracle, and fixed-point updates) that achieves external regret $O(\sqrt{T \log N})$ and swap regret $O(\sqrt{T r^3 \log r})$, with per-round runtime poly$(T)$ and independent of $N$. This yields rapid convergence to correlated equilibria in hidden subgames while preserving rationality in the full game, enabling scalable exploitation of sparse structure in AI alignment and language-game contexts.

Abstract

This paper investigates a class of games with large strategy spaces, motivated by challenges in AI alignment and language games. We introduce the hidden game problem, where for each player, an unknown subset of strategies consistently yields higher rewards compared to the rest. The central question is whether efficient regret minimization algorithms can be designed to discover and exploit such hidden structures, leading to equilibrium in these subgames while maintaining rationality in general. We answer this question affirmatively by developing a composition of regret minimization techniques that achieve optimal external and swap regret bounds. Our approach ensures rapid convergence to correlated equilibria in hidden subgames, leveraging the hidden game structure for improved computational efficiency.

The Hidden Game Problem

TL;DR

The paper addresses learning in enormous two-player games with a sparse, consistently dominant hidden set of actions in which payoffs satisfy and for . It introduces the hidden game problem and shows how to design regret-minimization algorithms that keep external regret low in any game while achieving sublinear swap regret when the hidden structure exists. The main technical contributions are a swap-regret minimization scheme that incrementally uncovers with a bound of and a simultaneous external+swap regret framework (via Hedge, Follow-the-Perturbed-Leader with a smooth oracle, and fixed-point updates) that achieves external regret and swap regret , with per-round runtime poly and independent of . This yields rapid convergence to correlated equilibria in hidden subgames while preserving rationality in the full game, enabling scalable exploitation of sparse structure in AI alignment and language-game contexts.

Abstract

This paper investigates a class of games with large strategy spaces, motivated by challenges in AI alignment and language games. We introduce the hidden game problem, where for each player, an unknown subset of strategies consistently yields higher rewards compared to the rest. The central question is whether efficient regret minimization algorithms can be designed to discover and exploit such hidden structures, leading to equilibrium in these subgames while maintaining rationality in general. We answer this question affirmatively by developing a composition of regret minimization techniques that achieve optimal external and swap regret bounds. Our approach ensures rapid convergence to correlated equilibria in hidden subgames, leveraging the hidden game structure for improved computational efficiency.

Paper Structure

This paper contains 22 sections, 8 theorems, 34 equations, 1 table, 3 algorithms.

Key Result

Theorem 1.1

There is an algorithm that achieves, with probability at least $1-\delta$ and against a fully adaptive adversary that selects an arbitrary sequence of loss vectors $\ell_t \in [0,1]^N$, external regret of and if there is a hidden game $R \subset [N]$, swap regret of where $\Phi_S$ is the set of all fixed deviations. Furthermore, with access to a smooth optimization oracle (see subsec:oracles), t

Theorems & Definitions (17)

  • Theorem 1.1: \ref{['thm:combined']}, informal
  • Definition 2.1: $\Phi$-regret
  • Remark 3.1: On the choice of $\log T$ expansions.
  • Lemma 3.2: Support containment
  • proof
  • Lemma 3.3: Swap regret bound for \ref{['algo:st-alg']}
  • proof
  • Remark 4.1: Indexing
  • proof
  • Theorem 4.4
  • ...and 7 more