Table of Contents
Fetching ...

State-Constrained Zero-Sum Differential Games with One-Sided Information

Mukesh Ghimire, Lei Zhang, Zhe Xu, Yi Ren

TL;DR

This work extends the theory of zero-sum differential games with one-sided information to settings with state constraints, proving the existence of a value and deriving primal and dual subdynamic principles that underpin computational strategy synthesis. The authors develop a backward-induction framework that leverages belief splitting to convexify the value and a dual game for the conjugate value, enabling explicit construction of behavioral strategies for both players. They address numerical challenges—value discontinuities, partial convexity, and convexification error—via physics-informed neural networks, partially convex value networks, and convex hull techniques, and demonstrate these methods on Hexner-like and football-style case studies. The results reveal how information asymmetry and state constraints shape optimal strategies and belief dynamics, with implications for scalable learning in continuous-action, long-horizon differential games.

Abstract

We study zero-sum differential games with state constraints and one-sided information, where the informed player (Player 1) has a categorical payoff type unknown to the uninformed player (Player 2). The goal of Player 1 is to minimize his payoff without violating the constraints, while that of Player 2 is to violate the state constraints if possible, or to maximize the payoff otherwise. One example of the game is a man-to-man matchup in football. Without state constraints, Cardaliaguet (2007) showed that the value of such a game exists and is convex to the common belief of players. Our theoretical contribution is an extension of this result to games with state constraints and the derivation of the primal and dual subdynamic principles necessary for computing behavioral strategies. Different from existing works that are concerned about the scalability of no-regret learning in games with discrete dynamics, our study reveals the underlying structure of strategies for belief manipulation resulting from information asymmetry and state constraints. This structure will be necessary for scalable learning on games with continuous actions and long time windows. We use a simplified football game to demonstrate the utility of this work, where we reveal player positions and belief states in which the attacker should (or should not) play specific random deceptive moves to take advantage of information asymmetry, and compute how the defender should respond.

State-Constrained Zero-Sum Differential Games with One-Sided Information

TL;DR

This work extends the theory of zero-sum differential games with one-sided information to settings with state constraints, proving the existence of a value and deriving primal and dual subdynamic principles that underpin computational strategy synthesis. The authors develop a backward-induction framework that leverages belief splitting to convexify the value and a dual game for the conjugate value, enabling explicit construction of behavioral strategies for both players. They address numerical challenges—value discontinuities, partial convexity, and convexification error—via physics-informed neural networks, partially convex value networks, and convex hull techniques, and demonstrate these methods on Hexner-like and football-style case studies. The results reveal how information asymmetry and state constraints shape optimal strategies and belief dynamics, with implications for scalable learning in continuous-action, long-horizon differential games.

Abstract

We study zero-sum differential games with state constraints and one-sided information, where the informed player (Player 1) has a categorical payoff type unknown to the uninformed player (Player 2). The goal of Player 1 is to minimize his payoff without violating the constraints, while that of Player 2 is to violate the state constraints if possible, or to maximize the payoff otherwise. One example of the game is a man-to-man matchup in football. Without state constraints, Cardaliaguet (2007) showed that the value of such a game exists and is convex to the common belief of players. Our theoretical contribution is an extension of this result to games with state constraints and the derivation of the primal and dual subdynamic principles necessary for computing behavioral strategies. Different from existing works that are concerned about the scalability of no-regret learning in games with discrete dynamics, our study reveals the underlying structure of strategies for belief manipulation resulting from information asymmetry and state constraints. This structure will be necessary for scalable learning on games with continuous actions and long time windows. We use a simplified football game to demonstrate the utility of this work, where we reveal player positions and belief states in which the attacker should (or should not) play specific random deceptive moves to take advantage of information asymmetry, and compute how the defender should respond.
Paper Structure (46 sections, 14 theorems, 114 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 46 sections, 14 theorems, 114 equations, 11 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

(Lemma 2.2 of cardaliaguet2007differential) For any pair $(\eta, \zeta) \in \mathcal{H}_r(t) \times \mathcal{Z}_r(t)$ and any $\omega := (\omega_1, \omega_2) \in \Omega_{\eta} \times \Omega_{\zeta}$, there is a unique pair $(\alpha_{\omega}, \delta_{\omega}) \in \mathcal{A}(t) \times \mathcal{D}(t)$ Furthermore the map $\omega \rightarrow (\alpha_{\omega}, \delta_{\omega})$ is measurable from $\Om

Figures (11)

  • Figure 1: Value along belief $(p)$ and time $(t)$ in Hexner's game. Belief splits to $A$ ($p=0$) and $B$ ($p=1$) depending on the true type of Player 1, when the value becomes concave should Player 1 play a non-revealing strategy. In other words, Player 1 delays the release of his type until a critical time. In more general cases, belief splitting may not fully reveal Player 1's type, leading to belief manipulation.
  • Figure 2: Trajectories of informed Player 1 (red) and uninformed Player 2 (blue) in an 8D Hexner's game w/ and w/o a state constraint or information asymmetry. Color shades indicate probabilities. When constrained, Player 1 stays away from Player 2 while trying to be closer to the target (the circle) than Player 2. Diamonds indicate initial states and stars indicate final states. See Sec. \ref{['sec:cases']} for details.
  • Figure 3: Schematics of a simplified football game with Player 1 (red) and Player 2 (blue). Left: the initial configuration. Right: equilibrium trajectory. Magenta circles: two goals. The filled is the current type private to Player 1. Players move in a 2D space bounded by $[-1, 1] \times [-1, 1]$.
  • Figure 4: Top: Average delay ($\mathcal{T}$) in information reveal (left) and average maximum advantage of playing the revealing strategy (right), keeping P2's location fixed at (-0.5, 0) and changing P1's location. Bottom: Trajectory with high delay and advantage (left) and with low delay and advantage (right). Color shades indicate current belief.
  • Figure 5: Trajectories where both players use their respective behavioral strategies. P1 keeps track of $p$, whereas P2 keeps track of $\hat{p}$.
  • ...and 6 more figures

Theorems & Definitions (21)

  • Lemma 1
  • Theorem 1
  • Corollary 1.1
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • Lemma 2
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 11 more