Table of Contents
Fetching ...

Tighter Value-Function Approximations for POMDPs

Merlijn Krale, Wietze Koops, Sebastian Junges, Thiago D. Simão, Nils Jansen

TL;DR

This work tackles the challenge of obtaining tight, tractable upper bounds for POMDP value functions to improve epsilon-optimal solvers. It introduces three bounds—Tighter Informed Bound (TIB), Optimized TIB (OTIB), and Entropy-based TIB (ETIB)—each trading off computational cost for tighter bounds; TIB uses a two-step observation delay, OTIB optimizes weights via a linear program, and ETIB uses a maximal-entropy heuristic to reuse a single weight across iterations. The authors prove soundness for all bounds, demonstrate their relative tightness to the Fast Informed Bound (FIB), and show that integrating them to initialize state-of-the-art solvers like SARSOP yields faster convergence and smaller optimality gaps, especially at higher discount factors. Empirical results across standard and novel POMDP benchmarks reveal that TIB and ETIB strike strong balances between tighter bounds and practical compute times, with OTIB offering the tightest bounds only at substantial computational cost. Overall, the paper provides a principled set of tighter, adaptable bounds that enhance POMDP solvers in both speed and solution quality, enabling more reliable planning under partial observability.

Abstract

Solving partially observable Markov decision processes (POMDPs) typically requires reasoning about the values of exponentially many state beliefs. Towards practical performance, state-of-the-art solvers use value bounds to guide this reasoning. However, sound upper value bounds are often computationally expensive to compute, and there is a tradeoff between the tightness of such bounds and their computational cost. This paper introduces new and provably tighter upper value bounds than the commonly used fast informed bound. Our empirical evaluation shows that, despite their additional computational overhead, the new upper bounds accelerate state-of-the-art POMDP solvers on a wide range of benchmarks.

Tighter Value-Function Approximations for POMDPs

TL;DR

This work tackles the challenge of obtaining tight, tractable upper bounds for POMDP value functions to improve epsilon-optimal solvers. It introduces three bounds—Tighter Informed Bound (TIB), Optimized TIB (OTIB), and Entropy-based TIB (ETIB)—each trading off computational cost for tighter bounds; TIB uses a two-step observation delay, OTIB optimizes weights via a linear program, and ETIB uses a maximal-entropy heuristic to reuse a single weight across iterations. The authors prove soundness for all bounds, demonstrate their relative tightness to the Fast Informed Bound (FIB), and show that integrating them to initialize state-of-the-art solvers like SARSOP yields faster convergence and smaller optimality gaps, especially at higher discount factors. Empirical results across standard and novel POMDP benchmarks reveal that TIB and ETIB strike strong balances between tighter bounds and practical compute times, with OTIB offering the tightest bounds only at substantial computational cost. Overall, the paper provides a principled set of tighter, adaptable bounds that enhance POMDP solvers in both speed and solution quality, enabling more reliable planning under partial observability.

Abstract

Solving partially observable Markov decision processes (POMDPs) typically requires reasoning about the values of exponentially many state beliefs. Towards practical performance, state-of-the-art solvers use value bounds to guide this reasoning. However, sound upper value bounds are often computationally expensive to compute, and there is a tradeoff between the tightness of such bounds and their computational cost. This paper introduces new and provably tighter upper value bounds than the commonly used fast informed bound. Our empirical evaluation shows that, despite their additional computational overhead, the new upper bounds accelerate state-of-the-art POMDP solvers on a wide range of benchmarks.

Paper Structure

This paper contains 43 sections, 23 theorems, 24 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

theorem 1

Given a belief $b$, a point set$\mathcal{B}$, and a function $Q \colon \mathcal{B} \times \mathcal{A} \rightarrow \mathbb{R}$ which over-approximates the $Q_{\mathrm{POMDP}}$-values of all beliefs-action pairs $(b',a) \in \mathcal{B} \times \mathcal{A}$. Then, any weight function $w \in \mathcal{W}_

Figures (3)

  • Figure 1: Visualisation of the Guessing POMDP.
  • Figure 2: Upper- and lower bounds on the initial value of the K-out-of-N (2) environments as computed by SARSOP in the first 60s, using different bounds as initialization. Solid lines show upper bounds, and dashed lines lower bounds.
  • Figure 3: Computation times of SARSOP against the discount factor, using different bounds as initialization.

Theorems & Definitions (29)

  • definition 1
  • definition 2
  • definition 3
  • theorem 1: Point set bound
  • definition 4
  • theorem 2
  • definition 5
  • theorem 3
  • definition 6
  • theorem 4
  • ...and 19 more