Table of Contents
Fetching ...

Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs

Saber Omidi, Marek Petrik, Se Young Yoon, Momotaz Begum

TL;DR

This paper addresses probabilistic safety in stochastic control by reframing the problem as an average-reward MDP (AVR). It proves that the probabilistic safety value function is captured by the optimal gain $g^\star(s)$, linking long-run safety guarantees with a linear-programming formulation that avoids discounting. The authors derive primal and dual LPs to compute $g$ and $h$ and extract safe policies, and they demonstrate significant computational advantages over discounted approaches. Numerical validation on a Double Integrator and an Inverted Pendulum shows accurate, high-confidence safe sets with faster convergence than minimum-discounted methods, highlighting practical impact for safety-critical robotics and control systems.

Abstract

Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution of the state variables. The unpredictable evolution of state variables poses a significant challenge for meeting predefined constraints using various control methods. To address this, we present a new algorithm that computes safe policies to determine the safety level across a finite state set. This algorithm reduces the safety objective to the standard average reward Markov Decision Process (MDP) objective. This reduction enables us to use standard techniques, such as linear programs, to compute and analyze safe policies. We validate the proposed method numerically on the Double Integrator and the Inverted Pendulum systems. Results indicate that the average-reward MDPs solution is more comprehensive, converges faster, and offers higher quality compared to the minimum discounted-reward solution.

Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs

TL;DR

This paper addresses probabilistic safety in stochastic control by reframing the problem as an average-reward MDP (AVR). It proves that the probabilistic safety value function is captured by the optimal gain , linking long-run safety guarantees with a linear-programming formulation that avoids discounting. The authors derive primal and dual LPs to compute and and extract safe policies, and they demonstrate significant computational advantages over discounted approaches. Numerical validation on a Double Integrator and an Inverted Pendulum shows accurate, high-confidence safe sets with faster convergence than minimum-discounted methods, highlighting practical impact for safety-critical robotics and control systems.

Abstract

Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution of the state variables. The unpredictable evolution of state variables poses a significant challenge for meeting predefined constraints using various control methods. To address this, we present a new algorithm that computes safe policies to determine the safety level across a finite state set. This algorithm reduces the safety objective to the standard average reward Markov Decision Process (MDP) objective. This reduction enables us to use standard techniques, such as linear programs, to compute and analyze safe policies. We validate the proposed method numerically on the Double Integrator and the Inverted Pendulum systems. Results indicate that the average-reward MDPs solution is more comprehensive, converges faster, and offers higher quality compared to the minimum discounted-reward solution.

Paper Structure

This paper contains 7 sections, 5 theorems, 35 equations, 4 figures.

Key Result

theorem 1

For every state $s \in S$ and confidence level $\alpha \in [0,1]$:

Figures (4)

  • Figure 1: Safe sets computed by AVR and MDR for the Double Integrator (right) and Inverted Pendulum (left). The dashed gray line outlines the constraint set $\mathcal{C}$, the solid black line shows AVR's safe set, and the colored lines show MDR's safe set ($Z(\bm{x})=0$) for different $\lambda$.
  • Figure 2: Safe sets computed using AVR as level sets of $g(s)$ for (a) Double Integrator and (b) Inverted Pendulum systems. The set $\mathcal{K}$ is dark green.
  • Figure 3: The relative size of the probabilistically-safe set $\mathcal{K}_{\alpha}$ as a function of the safety confidence level $\alpha$. The y-axis plots the ratio of the size of $\mathcal{K}_{\alpha}$ to the 100% safe set, $\mathcal{K}$. Results are shown for the Double Integrator (dashed red) and Inverted Pendulum (solid blue).
  • Figure 4: Runtime of the AVR (LP) and the MDR (VI) method for the Inverted Pendulum (left) and the Double Integrator (right) as a function of the total number of discrete states.

Theorems & Definitions (11)

  • definition 1
  • definition 2
  • theorem 1
  • proof
  • lemma 1
  • proof
  • lemma 2
  • lemma 2
  • proof
  • lemma 3
  • ...and 1 more