Certifiable Reachability Learning Using a New Lipschitz Continuous Value Function
Jingqi Li, Donggun Lee, Jaewon Lee, Kris Shengjun Dong, Somayeh Sojoudi, Claire Tomlin
TL;DR
This work addresses deterministic safety guarantees for reach-avoid sets in high-dimensional nonlinear systems by introducing a time-discounted reach-avoid value function $V_\\gamma(x)$ with a contractive Bellman operator and Lipschitz continuity when $\\gamma L_f<1$, enabling efficient learning via max-min DDPG. It couples this offline learning with two post-learning certification methods—Lipschitz-constant based and second-order cone programming (SOCP) based—to provide online real-time and offline guarantees that a neighborhood of states can reach a target set despite disturbances. The approach is validated on a 12D drone racing hardware experiment and a 10D highway take-over simulation, showing improved success rates over state-of-the-art constrained RL methods and real-time certification capabilities (e.g., 10 Hz). The results demonstrate deterministic RA guarantees for complex robotic systems with disturbances, offering a practical pathway to trustworthy autonomous operation in safety-critical settings.
Abstract
We propose a new reachability learning framework for high-dimensional nonlinear systems, focusing on reach-avoid problems. These problems require computing the reach-avoid set, which ensures that all its elements can safely reach a target set despite disturbances within pre-specified bounds. Our framework has two main parts: offline learning of a newly designed reachavoid value function, and post-learning certification. Compared to prior work, our new value function is Lipschitz continuous and its associated Bellman operator is a contraction mapping, both of which improve the learning performance. To ensure deterministic guarantees of our learned reach-avoid set, we introduce two efficient post-learning certification methods. Both methods can be used online for real-time local certification or offline for comprehensive certification. We validate our framework in a 12-dimensional crazyflie drone racing hardware experiment and a simulated 10-dimensional highway take-over example.
