Table of Contents
Fetching ...

Point-Based Value Iteration for POMDPs with Neural Perception Mechanisms

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

TL;DR

This work introduces neuro-symbolic POMDPs (NS-POMDPs), a continuous-state, partially observable framework where neural perception yields symbolic percepts that guide symbolic decision making. It develops a finite, piecewise linear and convex (P-PWLC) representation of the value function and proves its closure under Bellman backups, enabling both exact value iteration and a scalable NS-HSVI method. NS-HSVI provides lower and upper bounds using region- and belief-based representations, with two belief schemas (particle and region) to cope with continuous environments, and delivers convergence guarantees within a user-specified tolerance. Empirical results on car parking and VCAS demonstrate practical policy synthesis under neural perception, showing region-based beliefs offer robustness and NS-HSVI can outperform finite-state approximations in tighter bounds despite higher per-backup cost, highlighting the method’s potential for safety-critical autonomous systems.

Abstract

The increasing trend to integrate neural networks and conventional software components in safety-critical settings calls for methodologies for their formal modelling, verification and correct-by-construction policy synthesis. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), a variant of continuous-state POMDPs with discrete observations and actions, in which the agent perceives a continuous-state environment using a neural {\revise perception mechanism} and makes decisions symbolically. The perception mechanism classifies inputs such as images and sensor values into symbolic percepts, which are used in decision making. We study the problem of optimising discounted cumulative rewards for NS-POMDPs. Working directly with the continuous state space, we exploit the underlying structure of the model and the neural perception mechanism to propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the state space and value vectors, and extend Bellman backups to this representation. We prove the convexity and continuity of value functions and present two value iteration algorithms that ensure finite representability. The first is a classical (exact) value iteration algorithm extending the $α$-functions of Porta {\em et al} (2006) to the P-PWLC representation for continuous-state spaces. The second is a point-based (approximate) method called NS-HSVI, which uses the P-PWLC representation and belief-value induced functions to approximate value functions from below and above for two types of beliefs, particle-based and region-based. Using a prototype implementation, we show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions, by synthesising (approximately) optimal strategies.

Point-Based Value Iteration for POMDPs with Neural Perception Mechanisms

TL;DR

This work introduces neuro-symbolic POMDPs (NS-POMDPs), a continuous-state, partially observable framework where neural perception yields symbolic percepts that guide symbolic decision making. It develops a finite, piecewise linear and convex (P-PWLC) representation of the value function and proves its closure under Bellman backups, enabling both exact value iteration and a scalable NS-HSVI method. NS-HSVI provides lower and upper bounds using region- and belief-based representations, with two belief schemas (particle and region) to cope with continuous environments, and delivers convergence guarantees within a user-specified tolerance. Empirical results on car parking and VCAS demonstrate practical policy synthesis under neural perception, showing region-based beliefs offer robustness and NS-HSVI can outperform finite-state approximations in tighter bounds despite higher per-backup cost, highlighting the method’s potential for safety-critical autonomous systems.

Abstract

The increasing trend to integrate neural networks and conventional software components in safety-critical settings calls for methodologies for their formal modelling, verification and correct-by-construction policy synthesis. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), a variant of continuous-state POMDPs with discrete observations and actions, in which the agent perceives a continuous-state environment using a neural {\revise perception mechanism} and makes decisions symbolically. The perception mechanism classifies inputs such as images and sensor values into symbolic percepts, which are used in decision making. We study the problem of optimising discounted cumulative rewards for NS-POMDPs. Working directly with the continuous state space, we exploit the underlying structure of the model and the neural perception mechanism to propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the state space and value vectors, and extend Bellman backups to this representation. We prove the convexity and continuity of value functions and present two value iteration algorithms that ensure finite representability. The first is a classical (exact) value iteration algorithm extending the -functions of Porta {\em et al} (2006) to the P-PWLC representation for continuous-state spaces. The second is a point-based (approximate) method called NS-HSVI, which uses the P-PWLC representation and belief-value induced functions to approximate value functions from below and above for two types of beliefs, particle-based and region-based. Using a prototype implementation, we show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions, by synthesising (approximately) optimal strategies.
Paper Structure (33 sections, 21 theorems, 70 equations, 14 figures, 4 tables, 5 algorithms)

This paper contains 33 sections, 21 theorems, 70 equations, 14 figures, 4 tables, 5 algorithms.

Key Result

Lemma 1

There exists a smallest FCP of $S$, called the perception FCP, denoted $\Phi_{P}$, such that all states in any $\phi \in \Phi_{P}$ are observationally equivalent, i.e., if $(s_A,s_E),(s_A',s_E')\in \phi$, then $s_A=s_A'$ and we let $s_A^\phi= s_A$.

Figures (14)

  • Figure 1: Connections between neural and symbolic components in Example \ref{['ex:parking:model']}.
  • Figure 3: Illustration of the steps taken by the ISPP algorithm.
  • Figure 4: An example of region refinement during the ISSP algorithm (see Example \ref{['ex:refine']}).
  • Figure 5: Car parking with obstacles.
  • Figure 6: Paths and values for car parking (obstacle indicated by a black border, $\beta = 0.8$, collision rewards equal to $-1000$ (top) and $-5000$ (middle, bottom)). The top two rows are for deterministic environments, the bottom row uses a probabilistic environment.
  • ...and 9 more figures

Theorems & Definitions (32)

  • Definition 1: PWC function
  • Definition 2: PWL function
  • Definition 3: Syntax of NS-POMDPs
  • Remark 1
  • Definition 4: Semantics of NS-POMDPs
  • Definition 5: Belief MDP
  • Definition 6: Bellman operator
  • Lemma 1: Perception FCP
  • Definition 7: P-PWLC function
  • Theorem 1: P-PWLC closure and convergence
  • ...and 22 more