Tractable Equilibrium Computation in Markov Games through Risk Aversion

Eric Mazumdar; Kishan Panaganti; Laixi Shi

Tractable Equilibrium Computation in Markov Games through Risk Aversion

Eric Mazumdar, Kishan Panaganti, Laixi Shi

TL;DR

This work shows that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all $n$-player matrix and finite-horizon Markov games.

Abstract

A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all $n$-player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.

Tractable Equilibrium Computation in Markov Games through Risk Aversion

TL;DR

-player matrix and finite-horizon Markov games.

Abstract

-player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.

Paper Structure (38 sections, 12 theorems, 79 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 38 sections, 12 theorems, 79 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Contributions:
Related Works
Computational tractability of game theoretic solution concepts.
Predictive power of equilibrium concepts.
Risk-averse and robust multi-agent reinforcement learning.
Notations:
Matrix Games
Risk-Aversion in Matrix Games
Aggregate Risk Aversion:
Action-dependent Risk Aversion:
Bounded Rationality in Matrix Games
Conditions for Computational Tractability of RQE
Extension to Markov Games
Markov policies and value functions.
...and 23 more sections

Key Result

Theorem 1

Suppose that the set $\mathcal{X}$ is the set of functions mapping from a finite set $\Omega$ to $\mathbb{R}$. Then a mapping $\rho:\mathcal{X}\rightarrow \mathbb{R}$ is a convex risk measure (cf. Definition def:convexmetric) if and only if there exists a penalty function $D:\Delta_\Omega\rightarrow

Figures (3)

Figure 1: The shaded blue region depicts the regime of risk-aversion and bounded rationality preferences that allow for computationally tractable RQE in all $2$-player games as shown in \ref{['thm:2player']}. The markers GHP: Game 4goeree2003risk and SC: Game 1-12selten2008stationary represent the necessary parameter values required to recreate the average strategy played by people in various $2$-player games in observational data up to $1\%$ accuracy.
Figure 2: Cliff Walk Description and Results: The Cliff Walk grid-world is depicted here with color codes of grids: black is the cliff, blue and pink are agents/players 1 and 2's goals respectively, grey are grids for traversing. In both figures, red (blue) arrow depicts the actions taken by the learned policies for agent 1 (agent 2) reaching the states of the arrowhead. The agent 2 policy in the left figure showcases more risk-aversion (avoid each others' path to reduce the cliff risk) and less bounded rationality (not goal reaching). The agent 2 policy in the right figure showcases less risk-aversion (more chances of cliff risk by sharing the path) and more bounded rationality (goal reaching). We also mention values of the risk-averse and bounded rational parameters $\epsilon_1,\epsilon_2$ and $\tau_1,\tau_2$ used in our experiments satisfying our theoretical conditions.
Figure 3: Cliff-Walk results for the $\ell_1$ environmental uncertainty metric.

Theorems & Definitions (28)

Definition 1: Nash Equilibrium
Definition 2: Convex Risk Measures
Theorem 1: Dual Representation Theorem for Convex Risk Measures Risk_overview
Definition 3: Risk-adjusted Nash equilibrium
Theorem 2
proof
Example 1
Definition 4
Example 2
Definition 5
...and 18 more

Tractable Equilibrium Computation in Markov Games through Risk Aversion

TL;DR

Abstract

Tractable Equilibrium Computation in Markov Games through Risk Aversion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (28)