Myopically Verifiable Probabilistic Certificates for Safe Control and Learning
Zhuoyuan Wang, Haoming Jing, Christian Kurniawan, Albert Chern, Yorie Nakahira
TL;DR
The paper tackles the challenge of ensuring long-term safety in stochastic, latency-critical control by introducing probabilistic invariance, a myopic condition on the long-term safety probability that yields a probabilistic safety certificate. This certificate leads to affine online safety constraints, enabling safe control via additive modifications or MPC and enabling safe reinforcement learning through policy gradient and Q-learning with a safety filter. The authors provide theoretical guarantees (e.g., non-decreasing expected long-term safety probability) and practical computation methods, including path-integral importance sampling and approximate dynamic programming, along with extensive numerical experiments in control and RL settings. Results show the proposed approach maintains high long-term safety probabilities while preserving performance and providing a practical path to safe learning in stochastic environments. The work thus advances real-time safe control and safe learning for stochastic systems with unbounded disturbances and broad applicability to autonomous systems and robotics.
Abstract
This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed `probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.
