Table of Contents
Fetching ...

Computing the Reachability Value of Posterior-Deterministic POMDPs

Nathanaël Fijalkow, Arka Ghosh, Roman Kniazev, Guillermo A. Pérez, Pierre Vandenhove

TL;DR

This work identifies posterior-deterministic POMDPs, where knowing the current state uniquely determines the next state after any action-observation pair, as a natural, broad class bridging fully observable MDPs and more complex POMDPs. The authors prove that within this class the reachability value can be approximated to arbitrary precision, overcoming the general undecidability barrier. The approach combines an extended end-component framework, martingale theory, and a belief-tree unfolding that leverages the posterior-deterministic structure to ensure termination and correctness. The results encompass classic benchmarks like the Tiger POMDP and imply practical tractability for verification and synthesis tasks under partial observability in a wide domain of problems. Overall, the paper advances understanding of which POMDP subclasses admit effective value approximation for reachability objectives and provides a constructive algorithm with formal guarantees.

Abstract

Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set of target states, can compute the maximal probability of reaching the target states, or even approximate it up to a non-trivial constant. This is in stark contrast to fully observable Markov decision processes (MDPs), where the reachability value can be computed in polynomial time. In this work, we introduce posterior-deterministic POMDPs, a novel class of POMDPs. Our main technical contribution is to show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision. A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received. While the actual state is generally uncertain in POMDPs, the posterior-deterministic property tells us that once the true state is known it remains known forever. This simple and natural definition includes all MDPs and captures classical non-trivial examples such as the Tiger POMDP (Kaelbling et al. 1998), making it one of the largest known classes of POMDPs for which the reachability value can be approximated.

Computing the Reachability Value of Posterior-Deterministic POMDPs

TL;DR

This work identifies posterior-deterministic POMDPs, where knowing the current state uniquely determines the next state after any action-observation pair, as a natural, broad class bridging fully observable MDPs and more complex POMDPs. The authors prove that within this class the reachability value can be approximated to arbitrary precision, overcoming the general undecidability barrier. The approach combines an extended end-component framework, martingale theory, and a belief-tree unfolding that leverages the posterior-deterministic structure to ensure termination and correctness. The results encompass classic benchmarks like the Tiger POMDP and imply practical tractability for verification and synthesis tasks under partial observability in a wide domain of problems. Overall, the paper advances understanding of which POMDP subclasses admit effective value approximation for reachability objectives and provides a constructive algorithm with formal guarantees.

Abstract

Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set of target states, can compute the maximal probability of reaching the target states, or even approximate it up to a non-trivial constant. This is in stark contrast to fully observable Markov decision processes (MDPs), where the reachability value can be computed in polynomial time. In this work, we introduce posterior-deterministic POMDPs, a novel class of POMDPs. Our main technical contribution is to show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision. A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received. While the actual state is generally uncertain in POMDPs, the posterior-deterministic property tells us that once the true state is known it remains known forever. This simple and natural definition includes all MDPs and captures classical non-trivial examples such as the Tiger POMDP (Kaelbling et al. 1998), making it one of the largest known classes of POMDPs for which the reachability value can be approximated.
Paper Structure (15 sections, 1 theorem, 6 equations)

This paper contains 15 sections, 1 theorem, 6 equations.

Key Result

Lemma 0

Let $\mathcal{P}$ be a POMDP with target set $T$ and initial belief $\mathbf{b}$. We can, in time linear in the size of $\mathcal{P}$, build a POMDP $\mathcal{P}'$ with two new states $\top$ and $\bot$, and a belief $\mathbf{b}'$

Theorems & Definitions (1)

  • Lemma 0