On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

Zhiyuan Fan; Christian Kroer; Gabriele Farina

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

Zhiyuan Fan, Christian Kroer, Gabriele Farina

TL;DR

This paper establishes that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors, and recovers the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU.

Abstract

First-order methods (FOMs) are arguably the most scalable algorithms for equilibrium computation in large extensive-form games. To operationalize these methods, a distance-generating function, acting as a regularizer for the strategy space, must be chosen. The ratio between the strong convexity modulus and the diameter of the regularizer is a key parameter in the analysis of FOMs. A natural question is then: what is the optimal distance-generating function for extensive-form decision spaces? In this paper, we make a number of contributions, ultimately establishing that the weight-one dilated entropy (DilEnt) distance-generating function is optimal up to logarithmic factors. The DilEnt regularizer is notable due to its iterate-equivalence with Kernelized OMWU (KOMWU) -- the algorithm with state-of-the-art dependence on the game tree size in extensive-form games -- when used in conjunction with the online mirror descent (OMD) algorithm. However, the standard analysis for OMD is unable to establish such a result; the only current analysis is by appealing to the iterate equivalence to KOMWU. We close this gap by introducing a pair of primal-dual treeplex norms, which we contend form the natural analytic viewpoint for studying the strong convexity of DilEnt. Using these norm pairs, we recover the diameter-to-strong-convexity ratio that predicts the same performance as KOMWU. Along with a new regret lower bound for online learning in sequence-form strategy spaces, we show that this ratio is nearly optimal. Finally, we showcase our analytic techniques by refining the analysis of Clairvoyant OMD when paired with DilEnt, establishing an $\mathcal{O}(n \log |\mathcal{V}| \log T/T)$ approximation rate to coarse correlated equilibrium in $n$-player games, where $|\mathcal{V}|$ is the number of reduced normal-form strategies of the players, establishing the new state of the art.

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

TL;DR

Abstract

On the Optimality of Dilated Entropy and Lower Bounds for Online Learning in Extensive-Form Games

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (19)