Table of Contents
Fetching ...

To What Extent do Open-loop and Feedback Nash Equilibria Diverge in General-Sum Linear Quadratic Dynamic Games?

Chih-Yuan Chiu, Jingqi Li, Maulik Bhatt, Negar Mehr

TL;DR

It is proved that the OLNE strategies of an LQ dynamic game can be synthesized by solving the coupled Riccati equations of an auxiliary LQ game with perturbed costs, and an upper bound on the deviation between FBNE and OLNE of an LQ game is derived.

Abstract

Dynamic games offer a versatile framework for modeling the evolving interactions of strategic agents, whose steady-state behavior can be captured by the Nash equilibria of the games. Nash equilibria are often computed in feedback, with policies depending on the state at each time, or in open-loop, with policies depending only on the initial state. Empirically, open-loop Nash equilibria (OLNE) could be more efficient to compute, while feedback Nash equilibria (FBNE) often encode more complex interactions. However, it remains unclear exactly which dynamic games yield FBNE and OLNE that differ significantly and which do not. To address this problem, we present a principled comparison study of OLNE and FBNE in linear quadratic (LQ) dynamic games. Specifically, we prove that the OLNE strategies of an LQ dynamic game can be synthesized by solving the coupled Riccati equations of an auxiliary LQ game with perturbed costs. The construction of the auxiliary game allows us to establish conditions under which OLNE and FBNE coincide and derive an upper bound on the deviation between FBNE and OLNE of an LQ game.

To What Extent do Open-loop and Feedback Nash Equilibria Diverge in General-Sum Linear Quadratic Dynamic Games?

TL;DR

It is proved that the OLNE strategies of an LQ dynamic game can be synthesized by solving the coupled Riccati equations of an auxiliary LQ game with perturbed costs, and an upper bound on the deviation between FBNE and OLNE of an LQ game is derived.

Abstract

Dynamic games offer a versatile framework for modeling the evolving interactions of strategic agents, whose steady-state behavior can be captured by the Nash equilibria of the games. Nash equilibria are often computed in feedback, with policies depending on the state at each time, or in open-loop, with policies depending only on the initial state. Empirically, open-loop Nash equilibria (OLNE) could be more efficient to compute, while feedback Nash equilibria (FBNE) often encode more complex interactions. However, it remains unclear exactly which dynamic games yield FBNE and OLNE that differ significantly and which do not. To address this problem, we present a principled comparison study of OLNE and FBNE in linear quadratic (LQ) dynamic games. Specifically, we prove that the OLNE strategies of an LQ dynamic game can be synthesized by solving the coupled Riccati equations of an auxiliary LQ game with perturbed costs. The construction of the auxiliary game allows us to establish conditions under which OLNE and FBNE coincide and derive an upper bound on the deviation between FBNE and OLNE of an LQ game.
Paper Structure (14 sections, 8 theorems, 26 equations, 7 figures)

This paper contains 14 sections, 8 theorems, 26 equations, 7 figures.

Key Result

Proposition 1

If a dynamic LQ game $\mathcal{G} = (A^i, B^i, Q^i, R^i:i \in [N])$ satisfies Assumption Assumption: LQ Games, it admits unique FBNE strategies and trajectory given as follows, $\forall \ i \in [N]$, $t \in [T]$:

Figures (7)

  • Figure 1: The actual values of $\delta \hat{K}_t$ and the theoretical upper bound of $\delta \hat{K}_t$ as given by \ref{['Eqn: delta Z T+1 i']}-\ref{['Eqn: delta Zti']} (red, with circle marks), corresponding to the LQ games $\mathcal{G}$ and $\hat{\mathcal{G}}$ described in Example \ref{['Ex: Tightness of Upper Bound']}. The theoretical upper bound for $\delta \hat{K}_t$ closely matches the actual value of $\delta \hat{K}_t$ throughout the first few iterates of the backward iteration process, i.e., when $t$ is close to $T = 10$. Significant divergence between the theoretical upper bound and actual value of $\delta \hat{K}_t$ only occurs near the start of the time horizon.
  • Figure 2: Scatter plot of $\delta \tilde{K} := \Vert \tilde{K}_t - K_t \Vert_2$ vs. $\delta \tilde{Q} := \Vert \tilde{Q} - Q \Vert_2$, at each $t \in [4]$, for each of the 10000 sampled LQ dynamic games. Each sampled game admits a unique FBNE and a unique OLNE. These samples violate to different degrees the sufficient condition $\delta \tilde{K}_t = 0$ for the FBNE and OLNE trajectories to be aligned (Thm. \ref{['Thm: OLNE vs FBNE of G']}). Specifically, the game $\mathcal{G}_1$ barely violates the condition, while the game $\mathcal{G}_2$ violates the condition much more severely. Moreover, many sampled games correspond to large values of $\delta \tilde{Q}$ but tiny values of $\delta \tilde{K}_t$, which shows the difficulty of obtaining nontrivial lower bounds for $\delta \tilde{K}_t$ in terms of $\delta \tilde{Q}$.
  • Figure 3: Plots, for $\mathcal{G}_1$ and $\mathcal{G}_2$, of the FBNE trajectories, OLNE trajectories, and the differences between the FBNE and OLNE trajectories. We observe that the FBNE and OLNE trajectories of $\mathcal{G}_1$, which only slightly violates the sufficient condition in Thm. \ref{['Thm: OLNE vs FBNE of G']} (i.e., $\delta \tilde{K}_t = 0$), are within 0.25% of each other. In contrast, $\mathcal{G}_2$ violates the sufficient condition more severely, and its FBNE and OLNE trajectories differ up to 300%.
  • Figure 4: Scatter plot of $\delta \tilde{K} := \Vert \tilde{K}_t - K_t \Vert_2$ vs. $\delta \tilde{Q} := \Vert \tilde{Q} - Q \Vert_2$, at each $t \in [4]$, for each of the 10000 sampled LQ dynamic games. The dynamics matrices $A^1,A^2,B^1,B^2$ and state cost matrices $Q^1, Q^2$ of each sampled game is independently sampled, while the control cost matrices $R^1$ and $R^2$ are fixed across sampled games.
  • Figure 5: Scatter plot of $\delta \tilde{K} := \Vert \tilde{K}_t - K_t \Vert_2$ vs. $\delta \tilde{Q} := \Vert \tilde{Q} - Q \Vert_2$, at each $t \in [4]$, for each of the 2000 sampled LQ dynamic games. Each sample has randomly sampled values of $A^1,A^2,Q^1$ and $Q^2$. 1000 of these samples have high values of $\max_i \Vert A_i - \text{avg}(A) \Vert$ while the rest have low values of $\max_i \Vert A_i - \text{avg}(A) \Vert$, where $\text{avg}(A) := \frac{1}{2}(A_1 + A_2)$.
  • ...and 2 more figures

Theorems & Definitions (19)

  • Remark 1
  • Remark 2
  • Definition 1: Feedback Nash Equilibrium (FBNE)
  • Proposition 1: basar1998DynamicNoncooperativeGameTheory, Corollary 6.1
  • Definition 2: Open-Loop Nash Equilibrium (OLNE)
  • Proposition 2: basar1998DynamicNoncooperativeGameTheory, Thm. 6.2
  • Definition 3: Auxiliary LQ Game
  • Remark 3
  • Lemma 1
  • proof
  • ...and 9 more