Approximating N-Player Nash Equilibrium through Gradient Descent
Dongge Wang, Xiang Yan, Zehao Dou, Wenhan Huang, Yaodong Yang, Xiaotie Deng
TL;DR
The paper tackles the NP-hard challenge of computing approximate Nash equilibria in N-player general-sum games by introducing NashD, a distance-to-equilibrium measure based on pure-strategy best responses. It then solves for approximate NE via gradient descent in a global-view formulation, converting general-sum games to zero-sum with a fictitious player and projecting through softmax. The authors prove convergence to a local optimum with rate $O(L/T)$ under convex utilities and demonstrate strong empirical performance and robustness across GAMUT and random games, often outperforming TS, FP, and RM. The approach offers a scalable, gradient-based alternative to classical NE solvers for multi-agent settings and highlights the value of a global-view optimization in achieving stable convergence.
Abstract
Decoding how rational agents should behave in shared systems remains a critical challenge within theoretical computer science, artificial intelligence and economics studies. Central to this challenge is the task of computing the solution concept of games, which is Nash equilibrium (NE). Although computing NE in even two-player cases are known to be PPAD-hard, approximation solutions are of intensive interest in the machine learning domain. In this paper, we present a gradient-based approach to obtain approximate NE in N-player general-sum games. Specifically, we define a distance measure to an NE based on pure strategy best response, thereby computing an NE can be effectively transformed into finding the global minimum of this distance function through gradient descent. We prove that the proposed procedure converges to NE with rate $O(1/T)$ ($T$ is the number of iterations) when the utility function is convex. Experimental results suggest our method outperforms Tsaknakis-Spirakis algorithm, fictitious play and regret matching on various types of N-player normal-form games in GAMUT. In addition, our method demonstrates robust performance with increasing number of players and number of actions.
