A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

Yuxiang Guan; Giulio Salizzoni; Maryam Kamgarpour; Tyler H. Summers

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

Yuxiang Guan, Giulio Salizzoni, Maryam Kamgarpour, Tyler H. Summers

TL;DR

It is illustrated in numerical experiments that the convergence rate of the proposed policy iteration algorithm significantly surpasses that of the Gauss-Newton policy gradient method and other policy gradient variations.

Abstract

We present a policy iteration algorithm for the infinite-horizon N-player general-sum deterministic linear quadratic dynamic games and compare it to policy gradient methods. We demonstrate that the proposed policy iteration algorithm is distinct from the Gauss-Newton policy gradient method in the N-player game setting, in contrast to the single-player setting where under suitable choice of step size they are equivalent. We illustrate in numerical experiments that the convergence rate of the proposed policy iteration algorithm significantly surpasses that of the Gauss-Newton policy gradient method and other policy gradient variations. Furthermore, our numerical results indicate that, compared to policy gradient methods, the convergence performance of the proposed policy iteration algorithm is less sensitive to the initial policy and changes in the number of players.

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

TL;DR

Abstract

Paper Structure (13 sections, 1 theorem, 21 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 1 theorem, 21 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Problem Formulation: N-player General-Sum Deterministic LQDGs with Infinite-Horizon
Policy Iteration for N-player General-Sum Deterministic LQDGs with Infinite-Horizon
Value Iteration to Compute Nash Equilibrium Policies
Proposed Policy Iteration Algorithm
A Comparison of the Policy Iteration and Gauss-Newton Policy Gradient Algorithms
The Vanilla/Standard Policy Gradient and Natural Policy Gradient Methods
The Gauss-Newton Policy Gradient Method
Numerical Experiments
Faster Convergence Rate of Policy Iteration
Convergence Performance of Policy Iteration from a Distant Initial Policy
Convergence Performance of Policy Iteration for Additional Problem Instances
Conclusions

Key Result

Proposition 1

Suppose the above value iteration eqn:policy_gain and eqn:value_fun converges to $\{ K^{i*}, P^{i*}, \ i \in N \}$ which satisfy eqn:policy_gain_ss and eqn:value_fun_ss, and further suppose that for each $i\in N$ the pair $(A + \sum_{j=1,j \neq i}^N B^j K^{j*}$, $B^i)$ is stabilizable and the pair $

Figures (2)

Figure 1: Convergence speed of the proposed policy iteration algorithm (green), Gauss-Newton policy gradient (purple, $\eta^i=0.5$), and natural policy gradient (blue, $\eta^i=10^{-3}$; red, $\eta^i=10^{-2}$; yellow, $\eta^i=10^{-1}$) ($r=0.1$).
Figure 2: Convergence performance of the proposed policy iteration algorithm (green), Gauss-Newton policy gradient (purple, $\eta^i=0.5$), and natural policy gradient (yellow, $\eta^i=10^{-1}$) methods under different initial policy gains.

Theorems & Definitions (2)

Definition 1
Proposition 1: Proposition 6.3 from basar1998

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

TL;DR

Abstract

A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)