Table of Contents
Fetching ...

Reinforcement Nash Equilibrium Solver

Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An

TL;DR

This work tackles the challenge of computing Nash Equilibria in general-sum games, where exact NE computation is PPAD-Complete and traditional inexact solvers may diverge from NE. It introduces RENES, a reinforcement-learning-based framework that learns a single policy to modify game instances of varying size and then applies standard solvers on the modified games, using α-rank graphs and CP tensor decomposition to enable size-agnostic modification and a PPO-based training loop to optimize over both the game modifications and solver performance. The approach yields consistent improvements across multiple solvers ($α$-rank, CE, FP, PRD) on large-scale normal-form games and generalizes to unseen games, demonstrating the potential of pre-training a solver-agnostic modification policy. The work highlights a new direction in which game modification serves as a pre-training task to enhance equilibrium approximations, with implications for scalability, generalization, and potential extensions to other solution concepts and game types.

Abstract

Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as "inexact solvers", or "solvers" for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $α$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $α$-rank, CE, FP and PRD, and can be generalized to unseen games.

Reinforcement Nash Equilibrium Solver

TL;DR

This work tackles the challenge of computing Nash Equilibria in general-sum games, where exact NE computation is PPAD-Complete and traditional inexact solvers may diverge from NE. It introduces RENES, a reinforcement-learning-based framework that learns a single policy to modify game instances of varying size and then applies standard solvers on the modified games, using α-rank graphs and CP tensor decomposition to enable size-agnostic modification and a PPO-based training loop to optimize over both the game modifications and solver performance. The approach yields consistent improvements across multiple solvers (-rank, CE, FP, PRD) on large-scale normal-form games and generalizes to unseen games, demonstrating the potential of pre-training a solver-agnostic modification policy. The work highlights a new direction in which game modification serves as a pre-training task to enhance equilibrium approximations, with implications for scalability, generalization, and potential extensions to other solution concepts and game types.

Abstract

Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as "inexact solvers", or "solvers" for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as -rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., -rank, CE, FP and PRD, and can be generalized to unseen games.
Paper Structure (41 sections, 6 equations, 6 figures, 3 tables)

This paper contains 41 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Motivating Examples. The payoff matrices of the two players are displayed in the figure in green and red colors, respectively. The x-axis and y-axis are the values of $\delta^{1}$ and $\delta^{2}$, respectively. For plotting, the interval $[-2, 2]$ is discredited with a step size 0.1. The NashConv values of the solvers on the original games $M$ is marked as the green triangle and the minimal NashConv value is marked as the red star. We note that the games $M$ selected for the solvers are specifically designed and differ from each other.
  • Figure 2: Flow of RENES. Specifically, staring with the original game $M$, the modification oracle $\mathcal{O}$ modified the game to $M'$ and the solver $\mathcal{H}$ is applied to the modified game $M'$. The obtained solution $\pi$ is evaluated on $M$ with $\mathcal{E}$.
  • Figure 3: Payoff table and $\alpha$-rank response graph for Rock-Paper-Scissors (RPS) game, where $\alpha=100$ and $m=50$ when computing the $\alpha$-rank response graph.
  • Figure 4: Results of RENES in simple case. The solid lines and dotted lines are the results on the training set and the testing set of games, respectively. The transparent lines are the results with different seeds, as the runs will different seeds achieve the best performances in different epochs, so we plot them for better understanding of the training across different seeds. The blue line is the averaged results and the shaded area plots the standard deviation. Note that the y-axis scale differs across figures for better visualizations. The same style is also adopted in Figures \ref{['fig:general']} and \ref{['fig:simple_ablation']}.
  • Figure 5: Results of RENES in general case
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 1