What should be observed for optimal reward in POMDPs?

Alyzia-Maria Konsta; Alberto Lluch Lafuente; Christoph Matheja

What should be observed for optimal reward in POMDPs?

Alyzia-Maria Konsta, Alberto Lluch Lafuente, Christoph Matheja

TL;DR

This paper defines the optimal observability problem (OOP) for turning an MDP into a POMDP within a fixed observation budget while preserving a target bound on minimal expected reward. It proves undecidability in general and then focuses on decidable variants restricted to positional strategies, establishing complexity results such as NP-completeness for positional deterministic strategies and PSPACE decidability for positional randomized strategies. It introduces two solution approaches: an MDPrelated method using optimal MDP strategies to determine the minimal required observations, and an SMT-based method via a typed parametric Markov chain (tpMC) framework, enabling feasibility checking through etr-encodings. The authors further connect OOP to typed parametric Markov chains, provide correctness results, and report SMT-based experiments on standard POMDP benchmarks, demonstrating the practicality of solving small-to-medium instances. Overall, the work lays foundational theory for sensor placement in uncertain environments and offers concrete, implementable methods for design-time observability under budget constraints.

Abstract

Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.

What should be observed for optimal reward in POMDPs?

TL;DR

Abstract

What should be observed for optimal reward in POMDPs?

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (1)