Table of Contents
Fetching ...

Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction

Shili Sheng, Pian Yu, David Parker, Marta Kwiatkowska, Lu Feng

TL;DR

The paper tackles safe online planning for POMDPs in dynamic, multi-agent environments by leveraging Adaptive Conformal Prediction (ACP) to quantify trajectory prediction uncertainty. It introduces ACP-induced safety shields that convert probabilistic safety requirements into almost-sure constraints and integrates these shields into POMCP via a belief-support transition system to prune unsafe branches. The approach is validated on gridworlds with real pedestrian data, showing improved probabilistic safety guarantees (\phi^{\pi}(b_0) \ge 1-\delta) while maintaining competitive expected returns, with only modest runtime overhead. This work enables high-reliability autonomous decision-making in crowded, uncertain settings and lays groundwork for extending safe planning to broader POMDP domains and continuous-state problems.

Abstract

Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that maximizes expected returns while providing probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) to quantify the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents.

Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction

TL;DR

The paper tackles safe online planning for POMDPs in dynamic, multi-agent environments by leveraging Adaptive Conformal Prediction (ACP) to quantify trajectory prediction uncertainty. It introduces ACP-induced safety shields that convert probabilistic safety requirements into almost-sure constraints and integrates these shields into POMCP via a belief-support transition system to prune unsafe branches. The approach is validated on gridworlds with real pedestrian data, showing improved probabilistic safety guarantees (\phi^{\pi}(b_0) \ge 1-\delta) while maintaining competitive expected returns, with only modest runtime overhead. This work enables high-reliability autonomous decision-making in crowded, uncertain settings and lays groundwork for extending safe planning to broader POMDP domains and continuous-state problems.

Abstract

Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that maximizes expected returns while providing probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) to quantify the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents.
Paper Structure (13 sections, 4 theorems, 15 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 4 theorems, 15 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

The output of Algorithm alg:wr, denoted by $\{W_t^\tau\}_{\tau=1}^H$, comprises a set of winning regions, with each $W_t^\tau$ representing a winning region for an $(H-\tau)$-step horizon.

Figures (2)

  • Figure 1: Example gridworld with a robot navigating towards a flag while avoiding a pedestrian. The robot moves east, south, west, or north, reaching the adjacent grid cell with probability 0.1 or one cell further with probability 0.9. Gray shadow: robot's belief state $b_t$ including state $s_t$. Red circles: ACP prediction regions of uncertain predictions about pedestrian states. Yellow shadow: unsafe states per one-step prediction at timestep $t$.
  • Figure 2: Example scenes of real-world pedestrians trajectories from benchmark datasets amirian2020opentraj.

Theorems & Definitions (6)

  • Lemma 1
  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 1
  • proof