Table of Contents
Fetching ...

Myopically Verifiable Probabilistic Certificates for Safe Control and Learning

Zhuoyuan Wang, Haoming Jing, Christian Kurniawan, Albert Chern, Yorie Nakahira

TL;DR

The paper tackles the challenge of ensuring long-term safety in stochastic, latency-critical control by introducing probabilistic invariance, a myopic condition on the long-term safety probability that yields a probabilistic safety certificate. This certificate leads to affine online safety constraints, enabling safe control via additive modifications or MPC and enabling safe reinforcement learning through policy gradient and Q-learning with a safety filter. The authors provide theoretical guarantees (e.g., non-decreasing expected long-term safety probability) and practical computation methods, including path-integral importance sampling and approximate dynamic programming, along with extensive numerical experiments in control and RL settings. Results show the proposed approach maintains high long-term safety probabilities while preserving performance and providing a practical path to safe learning in stochastic environments. The work thus advances real-time safe control and safe learning for stochastic systems with unbounded disturbances and broad applicability to autonomous systems and robotics.

Abstract

This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed `probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.

Myopically Verifiable Probabilistic Certificates for Safe Control and Learning

TL;DR

The paper tackles the challenge of ensuring long-term safety in stochastic, latency-critical control by introducing probabilistic invariance, a myopic condition on the long-term safety probability that yields a probabilistic safety certificate. This certificate leads to affine online safety constraints, enabling safe control via additive modifications or MPC and enabling safe reinforcement learning through policy gradient and Q-learning with a safety filter. The authors provide theoretical guarantees (e.g., non-decreasing expected long-term safety probability) and practical computation methods, including path-integral importance sampling and approximate dynamic programming, along with extensive numerical experiments in control and RL settings. Results show the proposed approach maintains high long-term safety probabilities while preserving performance and providing a practical path to safe learning in stochastic environments. The work thus advances real-time safe control and safe learning for stochastic systems with unbounded disturbances and broad applicability to autonomous systems and robotics.

Abstract

This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed `probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.
Paper Structure (33 sections, 3 theorems, 83 equations, 10 figures, 1 table, 5 algorithms)

This paper contains 33 sections, 3 theorems, 83 equations, 10 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

Consider the closed-loop system of eq:x_trajectory and eq:generic_controller. If system eq:x_trajectory originates at $X_0=x$ with $\mathbf{F}(z)>1-\epsilon$, and the control action satisfies eq:safety_condition_each_Zt at all time, then the following condition holds for all time $t \in \mathbb R_+$

Figures (10)

  • Figure 1: Results in the worst-case setting. (a) the average system state over 100 trajectories. Red dotted line indicates the boundary of the safe set. (b) the expected safe probability.
  • Figure 2: Results in the switching control setting. (a) the averaged system state of 100 trajectories with its standard deviation. Red dotted line indicates the boundary of the safe set. (b) the empirical safe probability.
  • Figure 3: Results in the worst-case setting with nonlinear dynamics \ref{['eq:nonlinear_dynamics_experiment']}. (a) the average system state over 50 trajectories. Red dotted line indicates the boundary of the safe set. Black dotted line indicates the boundary of the nonlinear trap. (b) the expected safe probability.
  • Figure 4: Results in the switching control setting with nonlinear dynamics \ref{['eq:nonlinear_dynamics_experiment']}. (a) the averaged system state of 50 trajectories with its standard deviation. Red dotted line indicates the boundary of the safe set. Black dotted line indicates the boundary of the nonlinear trap. (b) the empirical safe probability.
  • Figure 5: Policy gradient with and without the proposed safety filter. (a) the averaged rewards for 1500 iterations. (b) sample state trajectories with the learned policy.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Remark 1
  • Theorem 1
  • proof : Proof (\ref{['lm:main_lemma']})
  • Lemma 1
  • proof : Proof (\ref{['lm:lm2']})
  • Remark 2
  • Remark 3
  • Remark 4
  • Lemma 2
  • proof : Proof (\ref{['lem:policy_gradient_safety_filter']})