Table of Contents
Fetching ...

Tractable Equilibrium Computation in Markov Games through Risk Aversion

Eric Mazumdar, Kishan Panaganti, Laixi Shi

TL;DR

This work shows that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all $n$-player matrix and finite-horizon Markov games.

Abstract

A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all $n$-player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.

Tractable Equilibrium Computation in Markov Games through Risk Aversion

TL;DR

This work shows that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all -player matrix and finite-horizon Markov games.

Abstract

A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of risk-averse quantal response equilibria (RQE) become tractable to compute in all -player matrix and finite-horizon Markov games. In particular, we show that they emerge as the endpoint of no-regret learning in suitably adjusted versions of the games. Crucially, the class of computationally tractable RQE is independent of the underlying game structure and only depends on agents' degree of risk-aversion and bounded rationality. To validate the richness of this class of solution concepts we show that it captures peoples' patterns of play in a number of 2-player matrix games previously studied in experimental economics. Furthermore, we give a first analysis of the sample complexity of computing these equilibria in finite-horizon Markov games when one has access to a generative model and validate our findings on a simple multi-agent reinforcement learning benchmark.
Paper Structure (38 sections, 12 theorems, 79 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 38 sections, 12 theorems, 79 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that the set $\mathcal{X}$ is the set of functions mapping from a finite set $\Omega$ to $\mathbb{R}$. Then a mapping $\rho:\mathcal{X}\rightarrow \mathbb{R}$ is a convex risk measure (cf. Definition def:convexmetric) if and only if there exists a penalty function $D:\Delta_\Omega\rightarrow

Figures (3)

  • Figure 1: The shaded blue region depicts the regime of risk-aversion and bounded rationality preferences that allow for computationally tractable RQE in all $2$-player games as shown in \ref{['thm:2player']}. The markers GHP: Game 4goeree2003risk and SC: Game 1-12selten2008stationary represent the necessary parameter values required to recreate the average strategy played by people in various $2$-player games in observational data up to $1\%$ accuracy.
  • Figure 2: Cliff Walk Description and Results: The Cliff Walk grid-world is depicted here with color codes of grids: black is the cliff, blue and pink are agents/players 1 and 2's goals respectively, grey are grids for traversing. In both figures, red (blue) arrow depicts the actions taken by the learned policies for agent 1 (agent 2) reaching the states of the arrowhead. The agent 2 policy in the left figure showcases more risk-aversion (avoid each others' path to reduce the cliff risk) and less bounded rationality (not goal reaching). The agent 2 policy in the right figure showcases less risk-aversion (more chances of cliff risk by sharing the path) and more bounded rationality (goal reaching). We also mention values of the risk-averse and bounded rational parameters $\epsilon_1,\epsilon_2$ and $\tau_1,\tau_2$ used in our experiments satisfying our theoretical conditions.
  • Figure 3: Cliff-Walk results for the $\ell_1$ environmental uncertainty metric.

Theorems & Definitions (28)

  • Definition 1: Nash Equilibrium
  • Definition 2: Convex Risk Measures
  • Theorem 1: Dual Representation Theorem for Convex Risk Measures Risk_overview
  • Definition 3: Risk-adjusted Nash equilibrium
  • Theorem 2
  • proof
  • Example 1
  • Definition 4
  • Example 2
  • Definition 5
  • ...and 18 more