HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Rui Yan; Gabriel Santos; Gethin Norman; David Parker; Marta Kwiatkowska

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

TL;DR

This work addresses online strategy synthesis for two-player zero-sum partially-observable stochastic games with neural perception, where one agent has partial information while the other is fully informed. It integrates continual resolving with offline HSVI bounds to compute an \varepsilon-minimax policy online: the partially-informed agent leverages a pre-computed lower bound \Vlb^{\Gamma} via a single stage LP, while the fully-informed agent uses an online inferred-belief strategy built on offline upper bounds \Vub^{\Upsilon}. The approach yields provable \varepsilon-optimality for both agents and is demonstrated in a pursuit-evasion scenario with neural perception, showing practical online solvability and bounded memory usage. Overall, the method enables efficient, adaptive minimax planning in continuous-state POSGs with neural perception, extending prior offline and finite-state techniques to a one-sided NS-POSG setting with neural perception.

Abstract

We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an $\varepsilon$-minimax strategy profile, which requires only one linear program to be solved for each agent at every stage, instead of a complex estimation of opponent counterfactual values. For the partially-informed agent, we propose a continual resolving approach which uses lower bounds, pre-computed offline with heuristic search value iteration (HSVI), instead of opponent counterfactual values. This inherits the soundness of continual resolving at the cost of pre-computing the bound. For the fully-informed agent, we propose an inferred-belief strategy, where the agent maintains an inferred belief about the belief of the partially-informed agent based on (offline) upper bounds from HSVI, guaranteeing $\varepsilon$-distance to the value of the game at the initial belief known to both agents.

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

TL;DR

Abstract

-minimax strategy profile, which requires only one linear program to be solved for each agent at every stage, instead of a complex estimation of opponent counterfactual values. For the partially-informed agent, we propose a continual resolving approach which uses lower bounds, pre-computed offline with heuristic search value iteration (HSVI), instead of opponent counterfactual values. This inherits the soundness of continual resolving at the cost of pre-computing the bound. For the fully-informed agent, we propose an inferred-belief strategy, where the agent maintains an inferred belief about the belief of the partially-informed agent based on (offline) upper bounds from HSVI, guaranteeing

-distance to the value of the game at the initial belief known to both agents.

Paper Structure (6 sections, 5 theorems, 9 equations, 2 figures, 2 algorithms)

This paper contains 6 sections, 5 theorems, 9 equations, 2 figures, 2 algorithms.

Introduction
Background
NS-HVSI Continual resolving
Inferred-Belief Strategy Synthesis
Experiments
Conclusions

Key Result

Lemma 1

For the NS-HSVI continual resolving at $((s_1, b_1), \alpha_1)$, the LP eq:lower-bound-LP-bounded-by-alpha admits at least one solution, and if the current state is $(s_1, s_E)$, then $b_1(s_E) > 0$.

Figures (2)

Figure 1: Left: NS-HSVI continual resolving for the partially-informed agent $\mathsf{Ag}_1$ (blue). Right: inferred-belief strategy synthesis for the fully-informed agent $\mathsf{Ag}_2$ (red).
Figure 2: Snippets of a synthesised strategy for the pursuer and evader (from left to right).

Theorems & Definitions (12)

Definition 1: Minimax
Definition 2: Stage strategy
Remark 1
Lemma 1: Existence and proper belief
proof
theorem 1: $\varepsilon$-minimax strategy for $\mathsf{Ag}_1$
proof
Lemma 2: Monotonicity
proof
theorem 2: $\varepsilon$-minimax strategy for $\mathsf{Ag}_2$
...and 2 more

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

TL;DR

Abstract

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)