Simplifying Complex Observation Models in Continuous POMDP Planning with Probabilistic Guarantees and Practice
Idan Lev-Yehudi, Moran Barenboim, Vadim Indelman
TL;DR
The paper addresses planning in continuous POMDPs with high-dimensional observations by replacing expensive observation models with a cheaper surrogate during planning while providing probabilistic guarantees on performance. The core idea is a state-dependent total variation bound, $\,\Delta_Z(x)$, that links the true value under $p_Z$ to the value under a simplified model $q_Z$, and an offline/online computation scheme that yields guaranteed bounds without online access to $p_Z$. It introduces a non-parametric local bound via $m_i$ and a cumulative bound $M_t^{\pi}$ (and the action-bound analog $\u001bPhi_t^{\pi}$), together with an online estimator $ ilde{m}_i$ based on pre-sampled delta-states and importance sampling. Theoretical convergence results extend PB-MDP concentration bounds to general policies, and a detailed 2D beacons simulation demonstrates reduced planning time and meaningful policy differences induced by the bounds, implying practical utility for real-time planning with visual observations and potential for runtime pruning and certification.
Abstract
Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.
