Optimal Mixed Strategy for Zero-Sum Differential Games
Tao Xu, Wang Xi, Jianping He
TL;DR
The paper tackles solving zero-sum differential games (ZSDGs) under mixed strategies without requiring vanishing commitment delays. It introduces a SDG-based weak-approximation framework that maps the mixed-strategy game to a pure-strategy SDG, ensuring close agreement in state distributions and costs and enabling certified bounds on the mixed-strategy value and suboptimality. The authors prove the existence of game value under the proposed mixed-strategy definition, establish order-$n$ weak approximations, and present a five-step procedure to obtain near-optimal mixed strategies with explicit error bounds. They validate the approach on a class of control-affine dynamics with quadratic costs, showing $O(\barπ)$ scaling for both value approximation error and strategy suboptimality, and provide numerical simulations that confirm distributional closeness and practical improvements from mixed strategies.
Abstract
Solving zero-sum differential games (ZSDGs) under mixed strategies has been challenging for decades. Existing research mainly focuses on characterizing the value function, while the problem of solving optimal mixed strategies remains open. To address this issue, we propose a novel weak-approximation-based method to solve ZSDGs under mixed strategies. The key idea is to design an SDG under pure strategies that closely approximates the original game under mixed strategies, ensuring that both the state distributions and cost expectations remain nearly identical over the entire time horizon. Based on the solution of this SDG, the value function under mixed strategies can be approximated with a certified approximation error. In addition, near-optimal mixed strategies can be designed with certified suboptimality gaps. We further apply this method to a class of ZSDGs with control-affine dynamics and quadratic costs, demonstrating that the value approximation error is of order $O(\barπ)$ and the strategy suboptimality gap is of order $O(\barπ)$ with respect to the maximum commitment delay $\barπ$. Numerical examples are provided to illustrate and validate our results.
