Online Planning in POMDPs with State-Requests

Raphael Avalos; Eugenio Bargiacchi; Ann Nowé; Diederik M. Roijers; Frans A. Oliehoek

Online Planning in POMDPs with State-Requests

Raphael Avalos, Eugenio Bargiacchi, Ann Nowé, Diederik M. Roijers, Frans A. Oliehoek

TL;DR

The paper addresses planning under partial observability when full state information can be obtained at a cost, introducing POMDP-SR and the online planner AEMS-SR. AEMS-SR leverages a rooted cyclic graph to mitigate exponential tree growth caused by state-requests and provides theoretical guarantees of completeness and $\varepsilon$-optimality. It formalizes the POMDP-SR framework, analyzes equivalent POMDP transformations, and adapts upper-bound strategies (including $Q$-MDP and FIB-SR) for SR scenarios with online bound refinement via corner beliefs. Empirical results on RobotDelivery and Tag show that AEMS-SR consistently outperforms POMCP and traditional AEMS, particularly when bounds can be improved online. The work demonstrates practical benefits for real-world domains where state queries are costly yet advantageous for decision quality, and outlines avenues for tailored policy design and broader applicability of AEMS-Loop.

Abstract

In key real-world problems, full state information is sometimes available but only at a high cost, like activating precise yet energy-intensive sensors or consulting humans, thereby compelling the agent to operate under partial observability. For this scenario, we propose AEMS-SR (Anytime Error Minimization Search with State Requests), a principled online planning algorithm tailored for POMDPs with state requests. By representing the search space as a graph instead of a tree, AEMS-SR avoids the exponential growth of the search space originating from state requests. Theoretical analysis demonstrates AEMS-SR's $\varepsilon$-optimality, ensuring solution quality, while empirical evaluations illustrate its effectiveness compared with AEMS and POMCP, two SOTA online planning algorithms. AEMS-SR enables efficient planning in domains characterized by partial observability and costly state requests offering practical benefits across various applications.

Online Planning in POMDPs with State-Requests

TL;DR

-optimality. It formalizes the POMDP-SR framework, analyzes equivalent POMDP transformations, and adapts upper-bound strategies (including

-MDP and FIB-SR) for SR scenarios with online bound refinement via corner beliefs. Empirical results on RobotDelivery and Tag show that AEMS-SR consistently outperforms POMCP and traditional AEMS, particularly when bounds can be improved online. The work demonstrates practical benefits for real-world domains where state queries are costly yet advantageous for decision quality, and outlines avenues for tailored policy design and broader applicability of AEMS-Loop.

Abstract

-optimality, ensuring solution quality, while empirical evaluations illustrate its effectiveness compared with AEMS and POMCP, two SOTA online planning algorithms. AEMS-SR enables efficient planning in domains characterized by partial observability and costly state requests offering practical benefits across various applications.

Paper Structure (17 sections, 8 theorems, 26 equations, 3 figures, 5 tables, 3 algorithms)

This paper contains 17 sections, 8 theorems, 26 equations, 3 figures, 5 tables, 3 algorithms.

Introduction
Background
Framework
Online planning: AEMS-SR
AEMS-Loop
Algorithm
Bounds
Experiments
Related Work
Conclusion
Equivalent POMDP
Example of Optimal Action Divergence between MDP, POMDP and POMDP-SR
Notations
Proofs
Algorithm to compute $\bar{\Psi}$ and $\bar{\Psi}_{b_{0}}$
...and 2 more sections

Key Result

Theorem 1

In any rooted graph $\mathcal{G}$ with root $b_{0}$ where values are computed according to Eq. eq:lower_belief using a lower bound value function L with error $e(b) = V^*(b) - L(b)$, the error on the root belief state is bounded by: $e_\mathcal{G}(b_{0}) = V^*(b_{0}) - L_\mathcal{G}(b_{0}) \leq \sum

Figures (3)

Figure 1: Tree and Graph representation after three successive expansions (the expanded beliefs are in green). Beliefs before selecting state request depicted by upward triangles, beliefs before environmental action by downward triangles, (not-)request state actions by diamonds, environmental actions by circles, and corner beliefs by rectangles. Some nodes are hidden for readability.
Figure 2: RobotDelivery ($n=3$), A is the agent, P the package, D (green) the delivery location, W (grey) the package waiting area, E (violet) the exit, and the blue tiles are possible package locations.
Figure 3: Tree representing an environment where the MDP and POMDP optimal action are the same but if the agent can request the state for a cost $1 \leq c \leq 4$ the optimal action changes. Circles represent states, actions are left $a_1$ and right $a_2$,

Theorems & Definitions (18)

Theorem 1
Theorem 2
Lemma 3
proof
Lemma 4
proof
Theorem 5
proof
Definition 6
Lemma 7
...and 8 more

Online Planning in POMDPs with State-Requests

TL;DR

Abstract

Online Planning in POMDPs with State-Requests

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (18)