Table of Contents
Fetching ...

Forward-Backward Dynamic Programming for LQG Dynamic Games with Partial and Asymmetric Information

Yuxiang Guan, Iman Shames, Tyler Summers

Abstract

We formulate and study a class of two-player zero-sum stochastic dynamic games with partial and asymmetric information. Information asymmetry introduces fundamental challenges involving \emph{belief representation} and \emph{theory of mind} issues, where agents must impute belief states and estimates of other agents to inform their own strategy. To avoid an infinite regress of higher-order beliefs amongst agents and obtain computationally implementable results, we focus on a linear quadratic Gaussian (LQG) model and consider strategies with limited internal state dimension. We present a novel iterative forward-backward algorithm to jointly compute belief states and equilibrium strategies and value functions for a finite-horizon problem. We also present a value iteration-like algorithm to jointly compute stationary belief states and equilibrium strategies for an average-cost infinite-horizon problem. An open-source implementation of the algorithms is provided, and we demonstrate the effectiveness of the proposed algorithms in numerical experiments.

Forward-Backward Dynamic Programming for LQG Dynamic Games with Partial and Asymmetric Information

Abstract

We formulate and study a class of two-player zero-sum stochastic dynamic games with partial and asymmetric information. Information asymmetry introduces fundamental challenges involving \emph{belief representation} and \emph{theory of mind} issues, where agents must impute belief states and estimates of other agents to inform their own strategy. To avoid an infinite regress of higher-order beliefs amongst agents and obtain computationally implementable results, we focus on a linear quadratic Gaussian (LQG) model and consider strategies with limited internal state dimension. We present a novel iterative forward-backward algorithm to jointly compute belief states and equilibrium strategies and value functions for a finite-horizon problem. We also present a value iteration-like algorithm to jointly compute stationary belief states and equilibrium strategies for an average-cost infinite-horizon problem. An open-source implementation of the algorithms is provided, and we demonstrate the effectiveness of the proposed algorithms in numerical experiments.
Paper Structure (15 sections, 4 theorems, 55 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 15 sections, 4 theorems, 55 equations, 6 figures, 1 table, 2 algorithms.

Key Result

proposition 1

Under Assumptions asmp:common_knowledge and asmp:belief_projection, the following forward propagation of state estimates for each player optimizes the estimation error in estimationerror: The a priori and a posteriori updates can be combined to form a single update of the form onestepfilters, where

Figures (6)

  • Figure 1: Pursuer and evader mean trajectories (solid lines), along with their respective mean state estimates (dashed lines) from initial state $(x_0, z^1_0, z^2_0) = [(-10, 0, 0, 0), (-10, 0, 0, -10), (-10, 0, 0, 0)]$. The squares and triangles represent the initial positions and initial position estimates of each player, respectively. The circles represent the final positions. Total Cost: $2.872\times10^{-3}$.
  • Figure 2: Pursuer and evader sample trajectories (solid lines), along with their respective sample estimates (dashed lines) from initial state $(x_0, z^1_0, z^2_0) = [(-10, 0, 0, 0), (-10, 0, 0, -10), (-10, 0, 0, 0)]$. The squares and triangles represent the initial positions and initial position estimates of each player, respectively. The circles represent the final positions.
  • Figure 3: Comparison of equilibrium strategies under limited vs. unlimited belief orders, starting from initial state $(x_0, z^1_0, z^2_0) = [(-10, 0, -0.5, 0.2), (-10, 0, -0.5, 0.2), (-10, 0, -0.5, -4.8)]$. DP: strategies computed using forward-backward dynamic programming with limited belief order. BR: strategies computed using best response dynamics with unlimited belief order.
  • Figure 4: Less maneuverable pursuer. ($B^1_{24} = 0.7 \Delta t$) Pursuer and evader mean trajectories, along with their respective mean sample estimates from a prescribed initial state. Total Cost: $3.833\times10^{-3}$.
  • Figure 5: Pursuer has noisier observations. ($V^1 = \mathrm{diag}(1, 50)$) Pursuer and evader mean trajectories, along with their respective mean sample estimates from a prescribed initial state. Total Cost: $7.572\times10^{-3}$.
  • ...and 1 more figures

Theorems & Definitions (13)

  • remark 1
  • remark 2
  • remark 3
  • remark 4
  • remark 5
  • proposition 1
  • proof
  • lemma 1: Principle of Subgame Equilibrium in Dynamic Games
  • proof
  • lemma 2: Dynamic Programming for Equilibrium Strategies
  • ...and 3 more