Configurable Mirror Descent: Towards a Unification of Decision Making

Pengdeng Li; Shuxin Li; Chang Yang; Xinrun Wang; Shuyue Hu; Xiao Huang; Hau Chan; Bo An

Configurable Mirror Descent: Towards a Unification of Decision Making

Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Shuyue Hu, Xiao Huang, Hau Chan, Bo An

TL;DR

The generalized mirror descent (GMD) is proposed, a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences, and the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures.

Abstract

Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific decision-making problems. Despite the successes in specific categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: \emph{Can we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problems?} There are several main challenges to address this question: i) different decision-making categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures. iii) We construct the \textsc{GameBench} with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.

Configurable Mirror Descent: Towards a Unification of Decision Making

TL;DR

Abstract

Paper Structure (41 sections, 1 theorem, 28 equations, 26 figures, 9 tables, 4 algorithms)

This paper contains 41 sections, 1 theorem, 28 equations, 26 figures, 9 tables, 4 algorithms.

Introduction
A Real-World Motivating Scenario
Related Work
Preliminaries
Decision Making
Mirror Descent
Configurable Mirror Descent
Generalized Mirror Descent
Meta-Controller for Different Measures
GameBench
Experiments
Results and Analysis
Limitations, Future Works, and Conclusions
Limitations and Future Works
Conclusions
...and 26 more sections

Key Result

Proposition 5.1

Assume that i) $\pi(a) \geq \epsilon$, $\forall a \in \mathcal{A}$, where $\epsilon$ is a small positive value and ii) the $\phi(\pi)$ defined on $\Pi$ can be decomposed toThe sum of convex functions is still a convex function. Furthermore, the negative entropy and squared Euclidean norm are two spe where $A=Q(\pi_{k}) +\sum_{\tau=0}^{M-1}\alpha_{\tau}\phi^{\prime}(\pi_{k-\tau})$, $B=\sum_{\tau=0}

Figures (26)

Figure 1: Overview of the categories of decision making and the four desiderata for the required method to satisfy.
Figure 2: Overview of CMD.
Figure 3: Overview of GameBench.
Figure 4: Summary of results. The first 6 figures correspond to single-agent and cooperative categories where the $y$-axis is OptGap. The rest figures correspond to other categories where the $y$-axis is NashConv. For all the figures, the $x$-axis is the number of iterations.
Figure 5: Results for different types of MC.
...and 21 more figures

Theorems & Definitions (2)

Proposition 5.1
proof : Proof of Proposition \ref{['prop:kkt_problem']}

Configurable Mirror Descent: Towards a Unification of Decision Making

TL;DR

Abstract

Configurable Mirror Descent: Towards a Unification of Decision Making

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (26)

Theorems & Definitions (2)