Table of Contents
Fetching ...

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

Zhiyuan Fan, Christian Kroer, Gabriele Farina

TL;DR

This paper establishes that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors, and recovers the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU.

Abstract

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs. A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU) -- the algorithm with state-of-the-art dependence on the game tree size in extensive-form games -- when used in conjunction with the online mirror descent (OMD) algorithm. However, the standard analysis for OMD is unable to establish such a result; the only current analysis is by appealing to the iterate equivalence to KOMWU. We close this gap by introducing a pair of primal-dual treeplex norms, which we contend form the natural analytic viewpoint for studying the strong convexity of DilEnt. Using these norm pairs, we recover the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU. Along with a new regret lower bound for online learning in sequence-form strategy spaces, we show that this ratio is nearly optimal. Finally, we showcase our analytic techniques by refining the analysis of Clairvoyant OMD when paired with DilEnt, establishing an $\mathcal{O}(n \log |\mathcal{V}| \log T/T)$ approximation rate to coarse correlated equilibrium in $n$-player games, where $|\mathcal{V}|$ is the number of reduced normal-form strategies of the players, establishing the new state of the art.

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

TL;DR

This paper establishes that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors, and recovers the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU.

Abstract

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs. A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU) -- the algorithm with state-of-the-art dependence on the game tree size in extensive-form games -- when used in conjunction with the online mirror descent (OMD) algorithm. However, the standard analysis for OMD is unable to establish such a result; the only current analysis is by appealing to the iterate equivalence to KOMWU. We close this gap by introducing a pair of primal-dual treeplex norms, which we contend form the natural analytic viewpoint for studying the strong convexity of DilEnt. Using these norm pairs, we recover the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU. Along with a new regret lower bound for online learning in sequence-form strategy spaces, we show that this ratio is nearly optimal. Finally, we showcase our analytic techniques by refining the analysis of Clairvoyant OMD when paired with DilEnt, establishing an approximation rate to coarse correlated equilibrium in -player games, where is the number of reduced normal-form strategies of the players, establishing the new state of the art.

Paper Structure

This paper contains 34 sections, 19 theorems, 103 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Lemma 3.2

For any $\mathbf{g}, \mathbf{g}' \in \mathbb{R}^\mathcal{E}$, it satisfies that $\|\Pi_{\varphi}(\mathbf{g}, \mathbf{x}) - \Pi_{\varphi}(\mathbf{g}', \mathbf{x})\| \leq \mu^{-1}\|\mathbf{g} - \mathbf{g}'\|_*$.

Figures (2)

  • Figure 1: An two-player extensive-form game and the corresponding TFSDP of player $1$. The TFSDP has decision point $\mathcal{J} = \{\textsf{A}, \textsf{B}, \textsf{C}, \textsf{D}\}$. It has tree size $\|\mathcal{Q}\|_1 = 4$ and leaf count $\|\mathcal{Q}\|_{\perp} = 2$, both given by the pure strategy $\{\textsf{A} \rightarrow \textsf{1}, \textsf{B} \rightarrow \textsf{3},\textsf{C} \rightarrow \textsf{5}\}$. Furthermore, The player $1$ has $|\mathcal{V}| = 7$ pure strategy profiles in total.
  • Figure 2: Eliminating observation point $\textsf{A2}$ from the TFSDP. The compressed extensive-form decision space $\mathcal{Q}$ remains unchanged and still has support $\{\textsf{3}, \textsf{4}, \textsf{5}, \textsf{6}, \textsf{7}, \textsf{8}, \textsf{9}\}$. Thus, the leaf count for the new TFSDP remains $\|\mathcal{Q}\|_{\perp} = 2$. Furthermore the total number of actions is reduced by one.

Theorems & Definitions (19)

  • Lemma 3.2
  • Theorem 5.1: Regret Bound for (Predictive) OMD, rakhlin2013onlinesyrgkanis2015fast
  • Lemma \ref{lm:norm-is-norm}: restatement
  • Lemma \ref{lm:tpnorm-recursive}: restatement
  • Theorem \ref{thm:tpnorms-duality}: restatement
  • Lemma \ref{lm:ub-Qnorm-upperbound}: restatement
  • Lemma \ref{lm:strongly-convex}: restatement
  • Lemma \ref{lm:ub-DilEnt-div-upperbound}: restatement
  • Theorem \ref{thm:omd-regret-ub}: restatement
  • Lemma \ref{thm:omd-regret-ub}
  • ...and 9 more