Safe POMDP Online Planning via Shielding
Shili Sheng, David Parker, Lu Feng
TL;DR
The paper tackles the lack of safety guarantees in online POMDP planning by introducing shielding mechanisms that enforce almost-sure reach-avoid specifications. It presents centralized and factored shielding methods, with prior-pruning and on-the-fly backtracking to integrate shields into POMCP, and proves correctness for the factored approach. Empirical results show that shields guarantee safety with negligible online planning overhead and that factored shielding scales to POMDPs with millions of states, often improving return via safer simulations. This work enables safe, scalable online planning for safety-critical robotic tasks, such as autonomous navigation and manipulation, under uncertainty.
Abstract
Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning.
