Table of Contents
Fetching ...

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

Yuxiang Guan, Giulio Salizzoni, Maryam Kamgarpour, Tyler H. Summers

TL;DR

It is illustrated in numerical experiments that the convergence rate of the proposed policy iteration algorithm significantly surpasses that of the Gauss-Newton policy gradient method and other policy gradient variations.

Abstract

We present a policy iteration algorithm for the infinite-horizon N-player general-sum deterministic linear quadratic dynamic games and compare it to policy gradient methods. We demonstrate that the proposed policy iteration algorithm is distinct from the Gauss-Newton policy gradient method in the N-player game setting, in contrast to the single-player setting where under suitable choice of step size they are equivalent. We illustrate in numerical experiments that the convergence rate of the proposed policy iteration algorithm significantly surpasses that of the Gauss-Newton policy gradient method and other policy gradient variations. Furthermore, our numerical results indicate that, compared to policy gradient methods, the convergence performance of the proposed policy iteration algorithm is less sensitive to the initial policy and changes in the number of players.

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

TL;DR

It is illustrated in numerical experiments that the convergence rate of the proposed policy iteration algorithm significantly surpasses that of the Gauss-Newton policy gradient method and other policy gradient variations.

Abstract

We present a policy iteration algorithm for the infinite-horizon N-player general-sum deterministic linear quadratic dynamic games and compare it to policy gradient methods. We demonstrate that the proposed policy iteration algorithm is distinct from the Gauss-Newton policy gradient method in the N-player game setting, in contrast to the single-player setting where under suitable choice of step size they are equivalent. We illustrate in numerical experiments that the convergence rate of the proposed policy iteration algorithm significantly surpasses that of the Gauss-Newton policy gradient method and other policy gradient variations. Furthermore, our numerical results indicate that, compared to policy gradient methods, the convergence performance of the proposed policy iteration algorithm is less sensitive to the initial policy and changes in the number of players.
Paper Structure (13 sections, 1 theorem, 21 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 1 theorem, 21 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Suppose the above value iteration eqn:policy_gain and eqn:value_fun converges to $\{ K^{i*}, P^{i*}, \ i \in N \}$ which satisfy eqn:policy_gain_ss and eqn:value_fun_ss, and further suppose that for each $i\in N$ the pair $(A + \sum_{j=1,j \neq i}^N B^j K^{j*}$, $B^i)$ is stabilizable and the pair $

Figures (2)

  • Figure 1: Convergence speed of the proposed policy iteration algorithm (green), Gauss-Newton policy gradient (purple, $\eta^i=0.5$), and natural policy gradient (blue, $\eta^i=10^{-3}$; red, $\eta^i=10^{-2}$; yellow, $\eta^i=10^{-1}$) ($r=0.1$).
  • Figure 2: Convergence performance of the proposed policy iteration algorithm (green), Gauss-Newton policy gradient (purple, $\eta^i=0.5$), and natural policy gradient (yellow, $\eta^i=10^{-1}$) methods under different initial policy gains.

Theorems & Definitions (2)

  • Definition 1
  • Proposition 1: Proposition 6.3 from basar1998