Rao-Blackwellized POMDP Planning
Jiho Lee, Nisar R. Ahmed, Kyle H. Wray, Zachary N. Sunberg
TL;DR
This paper advances planning under uncertainty by introducing RB-POMDP, a framework that uses Rao-Blackwellized Particle Filters to analytically handle tractable state components while sampling the rest, reducing particle requirements and variance. It couples RBPFs with a new online planner, RB-POMCPOW, which employs quadrature-based integration (e.g., Gaussian-Hermite, Smolyak grids) to compute expectations over marginalized states, thereby cutting Monte Carlo tree iterations. Empirical results in a GPS-denied localization task show RBPF with fewer particles can achieve higher ESS and comparable or better cumulative rewards, while RB-POMCPOW with moderate quadrature levels dramatically speeds planning (roughly sevenfold) relative to standard POMCPOW under the same time budget. The findings suggest RB-POMDPs offer scalable, efficient decision-making for high-dimensional POMDPs and complex planning problems.
Abstract
Partially Observable Markov Decision Processes (POMDPs) provide a structured framework for decision-making under uncertainty, but their application requires efficient belief updates. Sequential Importance Resampling Particle Filters (SIRPF), also known as Bootstrap Particle Filters, are commonly used as belief updaters in large approximate POMDP solvers, but they face challenges such as particle deprivation and high computational costs as the system's state dimension grows. To address these issues, this study introduces Rao-Blackwellized POMDP (RB-POMDP) approximate solvers and outlines generic methods to apply Rao-Blackwellization in both belief updates and online planning. We compare the performance of SIRPF and Rao-Blackwellized Particle Filters (RBPF) in a simulated localization problem where an agent navigates toward a target in a GPS-denied environment using POMCPOW and RB-POMCPOW planners. Our results not only confirm that RBPFs maintain accurate belief approximations over time with fewer particles, but, more surprisingly, RBPFs combined with quadrature-based integration improve planning quality significantly compared to SIRPF-based planning under the same computational limits.
