Table of Contents
Fetching ...

Computing Equilibria in Games with Stochastic Action Sets

Thomas Schwarz, Ryann Sim, Chun Kai Ling

TL;DR

An efficient approach based on sleeping internal regret minimization is introduced and it converges to approximate NE in 2p0s-GSAS at a rate $O(\sqrt{\log\vert A_i\vert/T})$ with appropriate choice of stepsizes, avoiding the exponential blow-up of game-dependent constants.

Abstract

The study of learning in games typically assumes that each player always has access to all of their actions. However, in many practical scenarios, arbitrary restrictions induced by exogenous stochasticity might be placed on a player's action set. To model this setting, for a game $\mathcal{G}_{\mathrm{orig}}$ with action set $A_i$ for each player $i$, we introduce the corresponding Game with Stochastic Action Sets (GSAS) which is parametrized by a probability distribution over the players' set of possible action subsets $\mathcal{S}_i \subseteq 2^{\vert A_i\vert}\backslash\{\varnothing\}$. In a GSAS, players' strategies and Nash equilibria (NE) admit prohibitively large representations, thus existing algorithms for NE computation scale poorly. Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size $\vert A_i\vert$, overcoming naive exponential sized representation of equilibria. Computationally, we introduce an efficient approach based on sleeping internal regret minimization and show that it converges to approximate NE in 2p0s-GSAS at a rate $O(\sqrt{\log\vert A_i\vert/T})$ with appropriate choice of stepsizes, avoiding the exponential blow-up of game-dependent constants.

Computing Equilibria in Games with Stochastic Action Sets

TL;DR

An efficient approach based on sleeping internal regret minimization is introduced and it converges to approximate NE in 2p0s-GSAS at a rate with appropriate choice of stepsizes, avoiding the exponential blow-up of game-dependent constants.

Abstract

The study of learning in games typically assumes that each player always has access to all of their actions. However, in many practical scenarios, arbitrary restrictions induced by exogenous stochasticity might be placed on a player's action set. To model this setting, for a game with action set for each player , we introduce the corresponding Game with Stochastic Action Sets (GSAS) which is parametrized by a probability distribution over the players' set of possible action subsets . In a GSAS, players' strategies and Nash equilibria (NE) admit prohibitively large representations, thus existing algorithms for NE computation scale poorly. Under the assumption that action availabilities are independent between players, we show that NE in two-player zero-sum (2p0s) GSAS can be compactly represented by a vector of size , overcoming naive exponential sized representation of equilibria. Computationally, we introduce an efficient approach based on sleeping internal regret minimization and show that it converges to approximate NE in 2p0s-GSAS at a rate with appropriate choice of stepsizes, avoiding the exponential blow-up of game-dependent constants.
Paper Structure (49 sections, 26 theorems, 81 equations, 9 figures, 1 table, 3 algorithms)

This paper contains 49 sections, 26 theorems, 81 equations, 9 figures, 1 table, 3 algorithms.

Key Result

Proposition 3.1

Consider a GSAS where $\pi = (\pi_1, \dots, \pi_n)$ be a strategy profile that implements $\mu = (\mu_1, \dots, \mu_n)$. Then $\pi$ is a $\epsilon$-Nash equilibrium if and only if for all $i \in [n]$

Figures (9)

  • Figure 1: Wallclock time to solve randomly generated GSAS by SI-MWU and Gurobi on sequence form linear program. Plot shows the average of the 20 runs with shaded region showing the range (max and min of wallclock time) of values across runs.
  • Figure 2: SI-regret from SI-MWU for several 2p0s-GSAS. For each game-type, the experiment is repeated 100 times. The average and the central 95% interval over the runs are shown. Theoretical bounds on the expected max SI-regret and observed max SI-regret with high probability are also shown.
  • Figure 3: SPR for the marginals obtained from SI-MWU for several 2p0s-GSAS. For each game we repeat the experiment 100 times and plot both the average and range (max and min regret) over the runs.
  • Figure 4: SPR of the compact $w_i^t$ computed using the output marginals from SI-MWU as input to \ref{['alg:compute_w']} for several 2p0s-GSAS. For each game we repeat 100 times and plot the average and range (max and min regret) over the runs.
  • Figure 5: Learnt weights using (L) robust averaging ($\eta_t \propto 1/\sqrt{t}$ and averaging as per \ref{['eqn:averaging']}), and (R) \ref{['alg:compute_w']} as written ($\eta_t\propto 1/t$, no averaging), on SI-MWU output.
  • ...and 4 more figures

Theorems & Definitions (57)

  • Definition 2.1: GSAS
  • Definition 2.3: 2p0s-GSAS
  • Definition 2.4: $\epsilon$-Nash equilibrium ($\epsilon$-NE)
  • Remark 2.5
  • Example 2.6
  • Definition 3.1
  • Proposition 3.1
  • Proposition 3.1
  • Proposition 3.1
  • Theorem 3.2
  • ...and 47 more