Table of Contents
Fetching ...

Geometric Structure and Polynomial-time Algorithm of Game Equilibria

Hongbo Sun, Chongkun Xia, Junbo Tan, Bo Yuan, Xueqian Wang, Bin Liang

TL;DR

The paper reframes game equilibrium computation as a two-subproblem optimization over policy and value, introducing unbiased KKT conditions and the equilibrium bundle to capture all perfect equilibria of dynamic games. A primal-dual unbiased interior-point method is recast as a line search on the equilibrium bundle, supplemented by dynamic programming in the policy cone to obtain a convergent, polynomial-time scheme. This hybrid approach yields an FPTAS for weak ε-approximations of perfect equilibria, which implies PPAD=FP, supported by theoretical oddness/existence results and experimental validation on thousands of dynamic games. The framework provides a scalable, model-agnostic pathway to robust multi-agent planning and learning, with potential to mitigate non-stationarity and multiagent curse in MARL while offering deep links to computational complexity theory.

Abstract

Whether a PTAS (polynomial-time approximation scheme) exists for game equilibria has been an open question, and its absence has indications and consequences in three fields: the practicality of methods in algorithmic game theory, non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning), and the tractability of PPAD in computational complexity theory. In this paper, we formalize the game equilibrium problem as an optimization problem that splits into two subproblems with respect to policy and value function, which are solved respectively by interior point method and dynamic programming. Combining these two parts, we obtain an FPTAS (fully PTAS) for the weak approximation (approximating to an $ε$-equilibrium) of any perfect equilibrium of any dynamic game, implying PPAD=FP since the weak approximation problem is PPAD-complete. In addition, we introduce a geometric object called equilibrium bundle, regarding which, first, perfect equilibria of dynamic games are formalized as zero points of its canonical section, second, the hybrid iteration of dynamic programming and interior point method is formalized as a line search on it, third, it derives the existence and oddness theorems as an extension of those of Nash equilibria. In experiment, the line search process is animated, and the method is tested on 2000 randomly generated dynamic games where it converges to a perfect equilibrium in every single case.

Geometric Structure and Polynomial-time Algorithm of Game Equilibria

TL;DR

The paper reframes game equilibrium computation as a two-subproblem optimization over policy and value, introducing unbiased KKT conditions and the equilibrium bundle to capture all perfect equilibria of dynamic games. A primal-dual unbiased interior-point method is recast as a line search on the equilibrium bundle, supplemented by dynamic programming in the policy cone to obtain a convergent, polynomial-time scheme. This hybrid approach yields an FPTAS for weak ε-approximations of perfect equilibria, which implies PPAD=FP, supported by theoretical oddness/existence results and experimental validation on thousands of dynamic games. The framework provides a scalable, model-agnostic pathway to robust multi-agent planning and learning, with potential to mitigate non-stationarity and multiagent curse in MARL while offering deep links to computational complexity theory.

Abstract

Whether a PTAS (polynomial-time approximation scheme) exists for game equilibria has been an open question, and its absence has indications and consequences in three fields: the practicality of methods in algorithmic game theory, non-stationarity and curse of multiagency in MARL (multi-agent reinforcement learning), and the tractability of PPAD in computational complexity theory. In this paper, we formalize the game equilibrium problem as an optimization problem that splits into two subproblems with respect to policy and value function, which are solved respectively by interior point method and dynamic programming. Combining these two parts, we obtain an FPTAS (fully PTAS) for the weak approximation (approximating to an -equilibrium) of any perfect equilibrium of any dynamic game, implying PPAD=FP since the weak approximation problem is PPAD-complete. In addition, we introduce a geometric object called equilibrium bundle, regarding which, first, perfect equilibria of dynamic games are formalized as zero points of its canonical section, second, the hybrid iteration of dynamic programming and interior point method is formalized as a line search on it, third, it derives the existence and oddness theorems as an extension of those of Nash equilibria. In experiment, the line search process is animated, and the method is tested on 2000 randomly generated dynamic games where it converges to a perfect equilibrium in every single case.
Paper Structure (20 sections, 16 theorems, 67 equations, 5 figures, 1 algorithm)

This paper contains 20 sections, 16 theorems, 67 equations, 5 figures, 1 algorithm.

Key Result

Proposition 1

The following statements about linear programming problem primal_lp and dual_lp hold, where $\bar{w}^s$ may be any vector such that $\bar{w}^s>0$.

Figures (5)

  • Figure 1: Graph of unbiased barrier problem. This figure is plotted with a dynamic game where $N=\mathcal{A}=\{0,1\},\mathcal{S}=\{0\}$. The graph is based on the joint space of policy $\pi_a^i$ and regret $r_a^i$. The positive half of the two axes represent two action indices $a\in\{0,1\}$ of $\pi_a^i$, the negative half of the two axes represent two action indices $a\in\{0,1\}$ of $r_a^i$, and the two subfigures represent two player indices $i\in\{0,1\}$. Plotting $\pi_a^i$ and $\hat{\pi}_a^i$ on the all positive orthant, $r_a^i$ and $\hat{r}_a^i$ on the all negative orthant, and $\mu_a^i$ between positive half and negative half of the axes as hyperbolas, $\hat{\pi}_a^i\circ r_a^i=\mu_a^i$ and $\pi_a^i\circ\hat{r}_a^i=\mu_a^i$ have rectangular shapes, and $(\pi_a^i-\hat{\pi}_a^i,r_a^i-\hat{r}_a^i)$ is the bias of two rectangles. $dv^i$ is the direction $r_a^i$ can move with fixed $\pi_a^i$ and within the constraint $r_a^i=v^i-\pi_{Aa}^{i-}U_A^i$ of unbiased barrier problem \ref{['ubarr_equ']}. With $r_a^i$ moving in direction $dv^i$ and the other two corners of the rectangle fixed on the hyperbolas, the figure illustrates how there is a unique $v^i$ to let $\hat{\pi}_a^i$ satisfy $\mathbf{1}_a\hat{\pi}_a^i=\mathbf{1}^i$ as stated in Theorem \ref{['ipm_theo']} (i), and the right subfigure shows a case where it is satisfied.
  • Figure 2: Graph of unbiased KKT conditions. This figure is plotted with a dynamic game where $N=\mathcal{S}=\mathcal{A}=\{0,1\}$. The graph is based on the policy space, where the two axes represent two player indices $i\in\{0,1\}$ of $\pi_a^{si}$, the two subfigures represent two state indices $s\in\{0,1\}$ of $\pi_a^{si}$, and only one of the two action indices $a=0$ is needed to represent $\pi_a^{si}$ since $\pi_a^{si}$ sums to $1$ over action indices. $\hat{\pi}_a^i=M(\mu_a^i)(\pi_a^{i-})$ is a set of hypersurfaces indexed by $i\in N$ and induced by the Brouwer function $M(\mu_a^i)$ for a given $\mu_a^i$, $\pi_a^i-\hat{\pi}_a^i$ shows the mapping of $\hat{\pi}_a^i=M(\mu_a^i)(\pi_a^i)$, and the intersections of the hypersurfaces are fixed points of $M(\mu_a^i)$. There is at least one intersection of the hypersurfaces according to Theorem \ref{['ipm_theo']} (iii), and there are almost always an odd number of intersections according to Theorem \ref{['odd_thm']} (ii), as extensions of the existence and oddness theorems of Nash equilibria. Differential $(d\pi_{a'}^j/\pi_{a'}^j)/(d\mu_{a"}^k/\mu_{a"}^k)$ illustrates that a intersection $\pi_a^i$ moves with the $i$-th hypersurface as $\mu_a^i$ varies on index $i$, since the $i$-th hypersurface is only relevant to $\mu_a^i$ on index $i$. The right subfigure shows a singular point in Definition \ref{['equil_bund_def']}, where the differential grows infinite large.
  • Figure 3: Graph of policy cone. This figure is plotted with a dynamic game where $N=\mathcal{S}=\mathcal{A}=\{0,1\}$. The graph is based on the value function space $\mathcal{V}$, where the two axes represent two state indices $s\in\{0,1\}$ of $V_s^i$, and the two subfigures represent two player indices $i\in\{0,1\}$ of $V_s^i$. As Proposition \ref{['cone_prop']} shows, $C_\pi$ is a set of hyperplane-surrounded cone-shaped regions indexed by $i\in N$, with $V_{\pi s}^i$ being its apexes, and with $\hat{C}_\pi$ contained in it. $\mathbf{1}_s$ is the monotone convergence direction, which induces unique pairs $(Y_{xs}^i,d_x^i)$ and $(\hat{Y}_{xs}^i,\hat{d}_x^i)$, and satisfies that $V_s^i+m^i\mathbf{1}_s$ lies in $\hat{C}_\pi$ for any $V_s^i$ and sufficiently large $m^i$. Theorem \ref{['cone_dp']} states that the residuals satisfy $V_s^i-D_\pi(V_s^i)=(1-\gamma)d_s^i$ and $V_s^i-\hat{D}_\pi(V_s^i)=(1-\gamma)\hat{d}_s^i$. Theorem \ref{['cone_equil']} states that $\pi_a^{si}(x)$ is a Nash equilibrium if and only if the corresponding pair of $Y_{xs}^i$ and $\hat{Y}_{xs}^i$ coincide, and $\pi_a^{si}$ is a perfect equilibrium if and only if $V_{\pi s}^i$ lie in $\hat{C}_\pi$, in which case $Y_{xs}^i$, $\hat{Y}_{xs}^i$, and $V_{\pi s}^i$ all coincide. Equation \ref{['cano_dpres']} shows that the relation between the canonical section and the two dynamic programming operators is $\mathbf{1}_a\bar{\mu}_a^{si}(V_s^i,\pi_a^{si})=\hat{D}_\pi(V_s^i)-D_\pi(V_s^i)$.
  • Figure 4: Sketch graph of the equilibrium bundle. This sketch graph is based on the joint space $\mathcal{P}\times\{\mu_a^{si}|\mu_a^{si}\geq 0\}$ of policy and barrier parameter. First, equilibrium bundle $E$ consists of the disjoint union of fibers $\{\pi_a^{si}\}\times B(\pi_a^{si})$ over each $\pi_a^{si}\in\mathcal{P}$, where each fiber $B(\pi_a^{si})$ is an affine subspace with the canonical section $\bar{\mu}_a^{si}(\pi_a^{si})$ being its least element, and perfect equilibria are zero points of the map $\bar{\mu}_a^{si}$. Second, Brouwer function $\hat{\pi}_a^{si}=M(\mu_a^{si})(\pi_a^{si})$ rearranges the points in the offset policy space $\mathcal{P}\times\{\mu_a^{si}\}$, where the fixed points are the intersections between $\mathcal{P}\times\{\mu_a^{si}\}$ and the equilibrium bundle $E$, and unbiased barrier problem depicts the approximation to those fixed points. Third, the polynomial function from $\pi_a^{si}$ to $\mu_a^{si}$ derives an algebraic curve ${\rm AC}$ that satisfies a parity argument, such that exactly one of its endpoints is connected with the starting point as $\mu'\to \infty$, and the rest of the endpoints are connected in pairs. In addition, the singular points of the equilibrium bundle are the multiple roots of the polynomial function. Finally, our method is a line search on the equilibrium bundle, which hops across the fibers to a zero point of the canonical section, and moves along a fiber to avoid singular points where the differential $d\pi_a^{si}/d\mu_{a'}^{sj}$ tends to infinity.
  • Figure 5: Iteration curve. The figure shows the convergence of the three iterations in Proposition \ref{['conv_rate']}. ${\rm Angle}(V_s^i-D_\pi(V_s^i),\mathbf{1}_s^i)$, $(\pi_a^{si}-\hat{\pi}_a^{si},r_a^{si}-\hat{r}_a^{si})$, and $\bar{\mu}_a^{si}(\pi_a^{si})$ all converging to $0$ indicates that the convergence point is a perfect equilibrium. In particular, ${\rm Angle}(V_s^i-D_\pi(V_s^i),\mathbf{1}_s^i)$ and $(\pi_a^{si}-\hat{\pi}_a^{si},r_a^{si}-\hat{r}_a^{si})$ staying converged during the whole iteration indicates that the iteration is a line search on the equilibrium bundle, and $\mu_a^{si}$ not decreasing in the middle of the iteration is due to singular avoidance.

Theorems & Definitions (45)

  • Definition 1: Dynamic game
  • Definition 2: Perfect equilibrium
  • Definition 3: Unbiased KKT conditions of dynamic games
  • Definition 4: Equilibrium bundle of dynamic games
  • Definition 5: Regret minimization problem of dynamic games
  • Proposition 1
  • Definition 6: Regret minimization problem
  • Theorem 2
  • Definition 7: Unbiased barrier problem
  • Definition 8: Unbiased KKT conditions
  • ...and 35 more