Table of Contents
Fetching ...

Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control

Matti Vahs, Joris Verhagen, Jana Tumova

TL;DR

This work proposes a layered, certificate-based control architecture that operates directly in belief space, decoupling goal reaching, information gathering, and safety into modular components and introduces Belief Control Lyapunov Functions (BCLFs) that formalize information gathering as a Lyapunov convergence problem in belief space, and shows how they can be learned via reinforcement learning.

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide a principled framework for robot decision-making under uncertainty. Solving reach-avoid POMDPs, however, requires coordinating three distinct behaviors: goal reaching, safety, and active information gathering to reduce uncertainty. Existing online POMDP solvers attempt to address all three within a single belief tree search, but this unified approach struggles with the conflicting time scales inherent to these objectives. We propose a layered, certificate-based control architecture that operates directly in belief space, decoupling goal reaching, information gathering, and safety into modular components. We introduce Belief Control Lyapunov Functions (BCLFs) that formalize information gathering as a Lyapunov convergence problem in belief space, and show how they can be learned via reinforcement learning. For safety, we develop Belief Control Barrier Functions (BCBFs) that leverage conformal prediction to provide probabilistic safety guarantees over finite horizons. The resulting control synthesis reduces to lightweight quadratic programs solvable in real time, even for non-Gaussian belief representations with dimension $>10^4$. Experiments in simulation and on a space-robotics platform demonstrate real-time performance and improved safety and task success compared to state-of-the-art constrained POMDP solvers.

Safety-critical Control Under Partial Observability: Reach-Avoid POMDP meets Belief Space Control

TL;DR

This work proposes a layered, certificate-based control architecture that operates directly in belief space, decoupling goal reaching, information gathering, and safety into modular components and introduces Belief Control Lyapunov Functions (BCLFs) that formalize information gathering as a Lyapunov convergence problem in belief space, and shows how they can be learned via reinforcement learning.

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide a principled framework for robot decision-making under uncertainty. Solving reach-avoid POMDPs, however, requires coordinating three distinct behaviors: goal reaching, safety, and active information gathering to reduce uncertainty. Existing online POMDP solvers attempt to address all three within a single belief tree search, but this unified approach struggles with the conflicting time scales inherent to these objectives. We propose a layered, certificate-based control architecture that operates directly in belief space, decoupling goal reaching, information gathering, and safety into modular components. We introduce Belief Control Lyapunov Functions (BCLFs) that formalize information gathering as a Lyapunov convergence problem in belief space, and show how they can be learned via reinforcement learning. For safety, we develop Belief Control Barrier Functions (BCBFs) that leverage conformal prediction to provide probabilistic safety guarantees over finite horizons. The resulting control synthesis reduces to lightweight quadratic programs solvable in real time, even for non-Gaussian belief representations with dimension . Experiments in simulation and on a space-robotics platform demonstrate real-time performance and improved safety and task success compared to state-of-the-art constrained POMDP solvers.
Paper Structure (33 sections, 10 theorems, 61 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 10 theorems, 61 equations, 13 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

If $B_x$ is a RCBF as in Def. def:RCBF for the continuous-time system eq:SDE and at each time $t$, $\bm{u}(t)$ satisfies Def. def:RCBF, then $\mathrm{Pr}\left[\bm{x}(t) \in \mathcal{C}_x, \space \forall t\geq t_0\right]=1$, provided that $\bm{x}(t_0) \in \mathcal{C}_x$.

Figures (13)

  • Figure 1: Illustration of our hardware experiments on a space-robotics platform. The robot starts from an unknown initial position that is located somewhere in the initial belief $\bm{b}_0$ and has to navigate to the goal region $\mathcal{S}_g$ without entering the avoid region $\mathcal{S}_a$. The robot can localize itself in a map by detecting impacts with the walls. Two runs with different initial conditions are shown.
  • Figure 2: Illustration of our proposed control architecture on the example of a free floating robot platform that uses impact detections as measurements. The robot uses measurements $\bm{z}$ to update a particle filter belief $\bm{b}$ at time $t$. Based on the current belief, the reference controller obtains an input based on the mean state $\bm{\mu}$, the belief CLF serves as information gathering controller and the belief CBF minimally corrects all unsafe control inputs.
  • Figure 3: Illustration of an example with a one dimensional state space. In this environment, the state can be accurately localized in a sensing region $\{ x \in \mathbb{R}\mid |x| \leq 0.1\}$ around the origin. The true state evolves according to a continuous-time SDE. The bottom plot shows the proposed uncertainty quantification as well as the belief empirical standard deviation over time.
  • Figure 4: Exemplary visualization of a belief CLF. The blue surface depicts the Lyapunov function with the current belief shown in grey. The maximum information gathering control input is the steepest direction on the BCLF, the reference control input points in a direction where the BCLF value increases and the optimized information gathering control input moves in the direction of the reference while enforcing convergence to the set of localized beliefs.
  • Figure 5: Visualization of the network architecture used to learn a belief control lyapunov function. The encoder and MLP are denoted by $e_{\psi}$ and $g_{\theta}$ with learnable parameters $\psi$ and $\theta$, respectively.
  • ...and 8 more figures

Theorems & Definitions (25)

  • Definition 1
  • Theorem 1: Thm. 2 in clark2021control
  • Definition 2
  • Theorem 2: Thm. 2.3 in haddad2022lyapunov
  • Definition 3
  • Theorem 3: Thm. 4.2 in lee2022finite
  • Lemma 1
  • Example 1
  • Definition 4
  • Theorem 4
  • ...and 15 more