Anytime Incremental $ρ$POMDP Planning in Continuous Spaces
Ron Benchetrit, Idan Lev-Yehudi, Andrey Zhitnikov, Vadim Indelman
TL;DR
The paper tackles planning under uncertainty in continuous spaces by extending $\rho$POMDPs with belief-dependent rewards and introducing $\rho$POMCPOW$, an online, anytime solver that progressively refines belief representations. It combines LVU-based backpropagation with incremental belief updates, enabling efficient handling of belief-dependent rewards such as entropy and information gain. The authors prove deterministic lower bounds on node visitation with consistent selection strategies, ensuring beliefs improve over time, and present $O(1)$ and $O(N)$ incremental update schemes for Shannon and Boers entropy estimators, respectively. Experimental results on continuous light-dark and active localization tasks demonstrate improved planning efficiency and solution quality over state-of-the-art solvers, highlighting practical gains for information-seeking robotics and autonomous systems.
Abstract
Partially Observable Markov Decision Processes (POMDPs) provide a robust framework for decision-making under uncertainty in applications such as autonomous driving and robotic exploration. Their extension, $ρ$POMDPs, introduces belief-dependent rewards, enabling explicit reasoning about uncertainty. Existing online $ρ$POMDP solvers for continuous spaces rely on fixed belief representations, limiting adaptability and refinement - critical for tasks such as information-gathering. We present $ρ$POMCPOW, an anytime solver that dynamically refines belief representations, with formal guarantees of improvement over time. To mitigate the high computational cost of updating belief-dependent rewards, we propose a novel incremental computation approach. We demonstrate its effectiveness for common entropy estimators, reducing computational cost by orders of magnitude. Experimental results show that $ρ$POMCPOW outperforms state-of-the-art solvers in both efficiency and solution quality.
