Partially Observable Stochastic Games with Neural Perception Mechanisms
Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska
TL;DR
Partially Observable Stochastic Games with Neural Perception Mechanisms address multi-agent decision making under partial information where perception is provided by neural classifiers. The authors define one-sided NS-POSGs, prove continuity and convexity of the value function $V^ op$, and show it admits a finite polyhedral, piecewise-linear-convex representation under mild assumptions. They introduce one-sided NS-HSVI, a heuristic search value iteration algorithm that uses PPWLC representations and NN pre-image-based polyhedra, with particle beliefs and LP-based backups. Empirical studies on pedestrian-vehicle and pursuit-evasion scenarios demonstrate the method's ability to synthesize strategies and to analyze how perception precision impacts safety and performance.
Abstract
Stochastic games are a well established model for multi-agent sequential decision making under uncertainty. In practical applications, though, agents often have only partial observability of their environment. Furthermore, agents increasingly perceive their environment using data-driven approaches such as neural networks trained on continuous data. We propose the model of neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of continuous-space concurrent stochastic games that explicitly incorporates neural perception mechanisms. We focus on a one-sided setting with a partially-informed agent using discrete, data-driven observations and another, fully-informed agent. We present a new method, called one-sided NS-HSVI, for approximate solution of one-sided NS-POSGs, which exploits the piecewise constant structure of the model. Using neural network pre-image analysis to construct finite polyhedral representations and particle-based representations for beliefs, we implement our approach and illustrate its practical applicability to the analysis of pedestrian-vehicle and pursuit-evasion scenarios.
