No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes

Minbiao Han; Fengxue Zhang; Yuxin Chen

No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes

Minbiao Han, Fengxue Zhang, Yuxin Chen

TL;DR

This paper provides a no-regret learning algorithm that utilizes Gaussian processes to identify the equilibrium in black-box games when the only available information about an agent's payoff comes in the form of empirical queries.

Abstract

This paper investigates the challenge of learning in black-box games, where the underlying utility function is unknown to any of the agents. While there is an extensive body of literature on the theoretical analysis of algorithms for computing the Nash equilibrium with complete information about the game, studies on Nash equilibrium in black-box games are less common. In this paper, we focus on learning the Nash equilibrium when the only available information about an agent's payoff comes in the form of empirical queries. We provide a no-regret learning algorithm that utilizes Gaussian processes to identify the equilibrium in such games. Our approach not only ensures a theoretical convergence rate but also demonstrates effectiveness across a variety collection of games through experimental validation.

No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes

TL;DR

Abstract

Paper Structure (28 sections, 6 theorems, 46 equations, 4 figures, 1 table)

This paper contains 28 sections, 6 theorems, 46 equations, 4 figures, 1 table.

Introduction
Our Results and Implications.
Related Work
Bayesian optimization approach
Other online learning algorithms
Preliminaries and Problem Setup
Algorithms
Approximation of the Partial Maximum
Adaptive Level-set Estimation for Global Optimization
Efficient High-dimensional Optimization through ROI Reduction
Theoretical Results
Experimental Results
Saddle.
Rock-Paper-Scissors (RPS).
Hotelling's Game.
...and 13 more sections

Key Result

Lemma 1

With the Assumption apt: sample_gp and Assumption apt: mono_ci, $\forall t_1 \leq t_2 \leq T, \boldsymbol{x}\in\mathcal{X}, i\in[n]$, we have ${\textrm{UCB}}_{v_i, t_1}(\boldsymbol{x}) \geq {\textrm{UCB}}_{v_i, t_2}(\boldsymbol{x})$ and ${\textrm{LCB}}\xspace_{v_i, t_1}(\boldsymbol{x}) \leq {\textr

Figures (4)

Figure 1: Function visualizations of Example $1$, where $x$-axis (i.e., $x_1$) represents agent 1's action and $y$-axis (i.e., $x_2$) represents agent 2's action. Agent 2's utility information is symmetric to Figure \ref{['fig:agent1']} and is therefore omitted from this plot. Figure \ref{['fig:agent1']} shows that a rational agent's utility maximization strategy (i.e., Utility Maxima) is highly different from the minima of the loss function (i.e., NE $(0.5, 0.5)$), which highlights the novelty and difficulty of optimizing our loss function (Equation (\ref{['eq:loss']})). Figure \ref{['fig:roi']} highlights the efficiency of our optimization algorithm by reducing the search space.
Figure 2: Experimental results. In each plot, the $x$-axis denotes the number of function evaluations. The curves show the $f(\boldsymbol{x}^t)$ values averaged over at least ten independent trials. The shaded area denotes the standard error. The observation perturbation is sampled from $\mathcal{N}{(0, 0.01)}$, while the simple regrets shown in the figures do not count the noise. We also include additional results on multi-player settings in Appendix \ref{['sec:additional_results']}.
Figure 3: Experimental results on choices of $\beta$. The theoretical value is defined as in Theorem \ref{['thm: simReg']}. In each plot, the $x$-axis denotes the number of function evaluations. The curves show the $f(\boldsymbol{x}^t)$ values averaged over at least ten independent trials. The shaded area denotes the standard error. The observation perturbation is sampled from $\mathcal{N}{(0, 0.01)}$, while the simple regrets shown in the figures do not count the noise.
Figure 4: Experimental results on Hotelling and Budget Allocation games when there are 3 players involved, where the $x$-axis denotes the number of function evaluations. The curves show the $f(\boldsymbol{x}^t)$ values averaged over at least ten independent trials, and the shaded area denotes the standard error. The observation perturbation is sampled from $\mathcal{N}{(0, 0.01)}$, while the simple regrets shown in the figures do not count the noise. The theoretical value is defined as in Theorem \ref{['thm: simReg']}.

Theorems & Definitions (12)

Example 1
Remark
Lemma 1
Lemma 2
Theorem 1
Theorem 2
Remark
Corollary 1
Corollary 2
Remark
...and 2 more

No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes

TL;DR

Abstract

No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (12)