Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction
Shili Sheng, Pian Yu, David Parker, Marta Kwiatkowska, Lu Feng
TL;DR
The paper tackles safe online planning for POMDPs in dynamic, multi-agent environments by leveraging Adaptive Conformal Prediction (ACP) to quantify trajectory prediction uncertainty. It introduces ACP-induced safety shields that convert probabilistic safety requirements into almost-sure constraints and integrates these shields into POMCP via a belief-support transition system to prune unsafe branches. The approach is validated on gridworlds with real pedestrian data, showing improved probabilistic safety guarantees (\phi^{\pi}(b_0) \ge 1-\delta) while maintaining competitive expected returns, with only modest runtime overhead. This work enables high-reliability autonomous decision-making in crowded, uncertain settings and lays groundwork for extending safe planning to broader POMDP domains and continuous-state problems.
Abstract
Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that maximizes expected returns while providing probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) to quantify the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents.
