Table of Contents
Fetching ...

Configurable Mirror Descent: Towards a Unification of Decision Making

Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Shuyue Hu, Xiao Huang, Hau Chan, Bo An

TL;DR

The generalized mirror descent (GMD) is proposed, a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences, and the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures.

Abstract

Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific decision-making problems. Despite the successes in specific categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: \emph{Can we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problems?} There are several main challenges to address this question: i) different decision-making categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures. iii) We construct the \textsc{GameBench} with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.

Configurable Mirror Descent: Towards a Unification of Decision Making

TL;DR

The generalized mirror descent (GMD) is proposed, a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences, and the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures.

Abstract

Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific decision-making problems. Despite the successes in specific categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: \emph{Can we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problems?} There are several main challenges to address this question: i) different decision-making categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures. iii) We construct the \textsc{GameBench} with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.
Paper Structure (41 sections, 1 theorem, 28 equations, 26 figures, 9 tables, 4 algorithms)

This paper contains 41 sections, 1 theorem, 28 equations, 26 figures, 9 tables, 4 algorithms.

Key Result

Proposition 5.1

Assume that i) $\pi(a) \geq \epsilon$, $\forall a \in \mathcal{A}$, where $\epsilon$ is a small positive value and ii) the $\phi(\pi)$ defined on $\Pi$ can be decomposed toThe sum of convex functions is still a convex function. Furthermore, the negative entropy and squared Euclidean norm are two spe where $A=Q(\pi_{k}) +\sum_{\tau=0}^{M-1}\alpha_{\tau}\phi^{\prime}(\pi_{k-\tau})$, $B=\sum_{\tau=0}

Figures (26)

  • Figure 1: Overview of the categories of decision making and the four desiderata for the required method to satisfy.
  • Figure 2: Overview of CMD.
  • Figure 3: Overview of GameBench.
  • Figure 4: Summary of results. The first 6 figures correspond to single-agent and cooperative categories where the $y$-axis is OptGap. The rest figures correspond to other categories where the $y$-axis is NashConv. For all the figures, the $x$-axis is the number of iterations.
  • Figure 5: Results for different types of MC.
  • ...and 21 more figures

Theorems & Definitions (2)

  • Proposition 5.1
  • proof : Proof of Proposition \ref{['prop:kkt_problem']}