Table of Contents
Fetching ...

Anytime Incremental $ρ$POMDP Planning in Continuous Spaces

Ron Benchetrit, Idan Lev-Yehudi, Andrey Zhitnikov, Vadim Indelman

TL;DR

The paper tackles planning under uncertainty in continuous spaces by extending $\rho$POMDPs with belief-dependent rewards and introducing $\rho$POMCPOW$, an online, anytime solver that progressively refines belief representations. It combines LVU-based backpropagation with incremental belief updates, enabling efficient handling of belief-dependent rewards such as entropy and information gain. The authors prove deterministic lower bounds on node visitation with consistent selection strategies, ensuring beliefs improve over time, and present $O(1)$ and $O(N)$ incremental update schemes for Shannon and Boers entropy estimators, respectively. Experimental results on continuous light-dark and active localization tasks demonstrate improved planning efficiency and solution quality over state-of-the-art solvers, highlighting practical gains for information-seeking robotics and autonomous systems.

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide a robust framework for decision-making under uncertainty in applications such as autonomous driving and robotic exploration. Their extension, $ρ$POMDPs, introduces belief-dependent rewards, enabling explicit reasoning about uncertainty. Existing online $ρ$POMDP solvers for continuous spaces rely on fixed belief representations, limiting adaptability and refinement - critical for tasks such as information-gathering. We present $ρ$POMCPOW, an anytime solver that dynamically refines belief representations, with formal guarantees of improvement over time. To mitigate the high computational cost of updating belief-dependent rewards, we propose a novel incremental computation approach. We demonstrate its effectiveness for common entropy estimators, reducing computational cost by orders of magnitude. Experimental results show that $ρ$POMCPOW outperforms state-of-the-art solvers in both efficiency and solution quality.

Anytime Incremental $ρ$POMDP Planning in Continuous Spaces

TL;DR

The paper tackles planning under uncertainty in continuous spaces by extending POMDPs with belief-dependent rewards and introducing POMCPOWO(1)O(N)$ incremental update schemes for Shannon and Boers entropy estimators, respectively. Experimental results on continuous light-dark and active localization tasks demonstrate improved planning efficiency and solution quality over state-of-the-art solvers, highlighting practical gains for information-seeking robotics and autonomous systems.

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide a robust framework for decision-making under uncertainty in applications such as autonomous driving and robotic exploration. Their extension, POMDPs, introduces belief-dependent rewards, enabling explicit reasoning about uncertainty. Existing online POMDP solvers for continuous spaces rely on fixed belief representations, limiting adaptability and refinement - critical for tasks such as information-gathering. We present POMCPOW, an anytime solver that dynamically refines belief representations, with formal guarantees of improvement over time. To mitigate the high computational cost of updating belief-dependent rewards, we propose a novel incremental computation approach. We demonstrate its effectiveness for common entropy estimators, reducing computational cost by orders of magnitude. Experimental results show that POMCPOW outperforms state-of-the-art solvers in both efficiency and solution quality.

Paper Structure

This paper contains 43 sections, 2 theorems, 27 equations, 4 figures, 4 tables, 3 algorithms.

Key Result

Theorem 1

Assume the action and observation selection strategies are consistent with functions $f, F$ and $g, G$, respectively. For a belief tree path $h_{\tau}$, the visitation counts satisfy: Here, $k(i_0, \dots, j_{\tau})$ ensures sufficient initial visitation counts. A more detailed version of this theorem, including the explicit closed-form expression for $k$ and its complete proof, is provided in App

Figures (4)

  • Figure 1: Illustration of belief tree construction by a state simulator (left) and a belief simulator (right). New particles and new nodes are marked in red. The state simulator updates beliefs by adding new particles along the trajectory, while the belief simulator maintains fixed beliefs once created.
  • Figure 2: Simulated trajectories in the Active Localization problem
  • Figure 3: Planning time comparison for $\rho$POMCPOW with and without incremental reward computation as a function of iterations.
  • Figure 4: Planning time comparison for POMCPOW and $\rho$POMCPOW with incremental reward computation as a function of iterations.

Theorems & Definitions (6)

  • Definition 1: Consistent Selection Strategy
  • Theorem 1: Node Visitation Lower Bound
  • Remark 1
  • Example 1
  • Corollary 1: Anytime Belief Refinement
  • proof