Forward-Backward Dynamic Programming for LQG Dynamic Games with Partial and Asymmetric Information

Yuxiang Guan; Iman Shames; Tyler Summers

Forward-Backward Dynamic Programming for LQG Dynamic Games with Partial and Asymmetric Information

Yuxiang Guan, Iman Shames, Tyler Summers

Abstract

We formulate and study a class of two-player zero-sum stochastic dynamic games with partial and asymmetric information. Information asymmetry introduces fundamental challenges involving \emph{belief representation} and \emph{theory of mind} issues, where agents must impute belief states and estimates of other agents to inform their own strategy. To avoid an infinite regress of higher-order beliefs amongst agents and obtain computationally implementable results, we focus on a linear quadratic Gaussian (LQG) model and consider strategies with limited internal state dimension. We present a novel iterative forward-backward algorithm to jointly compute belief states and equilibrium strategies and value functions for a finite-horizon problem. We also present a value iteration-like algorithm to jointly compute stationary belief states and equilibrium strategies for an average-cost infinite-horizon problem. An open-source implementation of the algorithms is provided, and we demonstrate the effectiveness of the proposed algorithms in numerical experiments.

Forward-Backward Dynamic Programming for LQG Dynamic Games with Partial and Asymmetric Information

Abstract

Paper Structure (15 sections, 4 theorems, 55 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 15 sections, 4 theorems, 55 equations, 6 figures, 1 table, 2 algorithms.

Introduction
Finite Horizon LQG Dynamic Games with Partial and Asymmetric Information
Problem Formulation
Forward State Estimation with Partial and Asymmetric Information
Backward Belief-State Feedback Strategies with Partial and Asymmetric Information
A Forward-Backward Algorithm for Computing Equilibrium Solutions
Infinite Horizon LQG Dynamic Games with Partial and Asymmetric Information
Forward Recursion Convergence
Backward Recursion Convergence
Joint Convergence and Forward-Backward Value Iteration
Numerical Experiments
Comparing strategies with limited vs. unlimited belief order
Exploring asymmetries in controllability and observability
Extensions, Variations, and Open Problems
Conclusions

Key Result

proposition 1

Under Assumptions asmp:common_knowledge and asmp:belief_projection, the following forward propagation of state estimates for each player optimizes the estimation error in estimationerror: The a priori and a posteriori updates can be combined to form a single update of the form onestepfilters, where

Figures (6)

Figure 1: Pursuer and evader mean trajectories (solid lines), along with their respective mean state estimates (dashed lines) from initial state $(x_0, z^1_0, z^2_0) = [(-10, 0, 0, 0), (-10, 0, 0, -10), (-10, 0, 0, 0)]$. The squares and triangles represent the initial positions and initial position estimates of each player, respectively. The circles represent the final positions. Total Cost: $2.872\times10^{-3}$.
Figure 2: Pursuer and evader sample trajectories (solid lines), along with their respective sample estimates (dashed lines) from initial state $(x_0, z^1_0, z^2_0) = [(-10, 0, 0, 0), (-10, 0, 0, -10), (-10, 0, 0, 0)]$. The squares and triangles represent the initial positions and initial position estimates of each player, respectively. The circles represent the final positions.
Figure 3: Comparison of equilibrium strategies under limited vs. unlimited belief orders, starting from initial state $(x_0, z^1_0, z^2_0) = [(-10, 0, -0.5, 0.2), (-10, 0, -0.5, 0.2), (-10, 0, -0.5, -4.8)]$. DP: strategies computed using forward-backward dynamic programming with limited belief order. BR: strategies computed using best response dynamics with unlimited belief order.
Figure 4: Less maneuverable pursuer. ($B^1_{24} = 0.7 \Delta t$) Pursuer and evader mean trajectories, along with their respective mean sample estimates from a prescribed initial state. Total Cost: $3.833\times10^{-3}$.
Figure 5: Pursuer has noisier observations. ($V^1 = \mathrm{diag}(1, 50)$) Pursuer and evader mean trajectories, along with their respective mean sample estimates from a prescribed initial state. Total Cost: $7.572\times10^{-3}$.
...and 1 more figures

Theorems & Definitions (13)

remark 1
remark 2
remark 3
remark 4
remark 5
proposition 1
proof
lemma 1: Principle of Subgame Equilibrium in Dynamic Games
proof
lemma 2: Dynamic Programming for Equilibrium Strategies
...and 3 more

Forward-Backward Dynamic Programming for LQG Dynamic Games with Partial and Asymmetric Information

Abstract

Forward-Backward Dynamic Programming for LQG Dynamic Games with Partial and Asymmetric Information

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)