Certifiable Reachability Learning Using a New Lipschitz Continuous Value Function

Jingqi Li; Donggun Lee; Jaewon Lee; Kris Shengjun Dong; Somayeh Sojoudi; Claire Tomlin

Certifiable Reachability Learning Using a New Lipschitz Continuous Value Function

Jingqi Li, Donggun Lee, Jaewon Lee, Kris Shengjun Dong, Somayeh Sojoudi, Claire Tomlin

TL;DR

This work addresses deterministic safety guarantees for reach-avoid sets in high-dimensional nonlinear systems by introducing a time-discounted reach-avoid value function $V_\\gamma(x)$ with a contractive Bellman operator and Lipschitz continuity when $\\gamma L_f<1$, enabling efficient learning via max-min DDPG. It couples this offline learning with two post-learning certification methods—Lipschitz-constant based and second-order cone programming (SOCP) based—to provide online real-time and offline guarantees that a neighborhood of states can reach a target set despite disturbances. The approach is validated on a 12D drone racing hardware experiment and a 10D highway take-over simulation, showing improved success rates over state-of-the-art constrained RL methods and real-time certification capabilities (e.g., 10 Hz). The results demonstrate deterministic RA guarantees for complex robotic systems with disturbances, offering a practical pathway to trustworthy autonomous operation in safety-critical settings.

Abstract

We propose a new reachability learning framework for high-dimensional nonlinear systems, focusing on reach-avoid problems. These problems require computing the reach-avoid set, which ensures that all its elements can safely reach a target set despite disturbances within pre-specified bounds. Our framework has two main parts: offline learning of a newly designed reachavoid value function, and post-learning certification. Compared to prior work, our new value function is Lipschitz continuous and its associated Bellman operator is a contraction mapping, both of which improve the learning performance. To ensure deterministic guarantees of our learned reach-avoid set, we introduce two efficient post-learning certification methods. Both methods can be used online for real-time local certification or offline for comprehensive certification. We validate our framework in a 12-dimensional crazyflie drone racing hardware experiment and a simulated 10-dimensional highway take-over example.

Certifiable Reachability Learning Using a New Lipschitz Continuous Value Function

TL;DR

This work addresses deterministic safety guarantees for reach-avoid sets in high-dimensional nonlinear systems by introducing a time-discounted reach-avoid value function

with a contractive Bellman operator and Lipschitz continuity when

, enabling efficient learning via max-min DDPG. It couples this offline learning with two post-learning certification methods—Lipschitz-constant based and second-order cone programming (SOCP) based—to provide online real-time and offline guarantees that a neighborhood of states can reach a target set despite disturbances. The approach is validated on a 12D drone racing hardware experiment and a 10D highway take-over simulation, showing improved success rates over state-of-the-art constrained RL methods and real-time certification capabilities (e.g., 10 Hz). The results demonstrate deterministic RA guarantees for complex robotic systems with disturbances, offering a practical pathway to trustworthy autonomous operation in safety-critical settings.

Abstract

Paper Structure (15 sections, 3 theorems, 34 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 3 theorems, 34 equations, 8 figures, 1 table, 1 algorithm.

Introduction
Related works
Problem Formulation
A New Reach-Avoid Value Function
Learning the New RA Value Function
Certifying RA Sets with Guarantees
Certification using Lipschitz constants
Certification using second-order cone programming
Combining reachability learning and certification
Experiments
Hypothesis 1: Our learned policy has a higher success rate than state-of-the-art constrained RL methods
Hypothesis 2: Our online RA set certification methods can be computed in real-time
Hypothesis 3: The Lipschitz continuity of our new value function appears to accelerate learning
The trade-off of selecting a time-discount factor
Conclusion and Future Work

Key Result

Theorem 1

Let $\gamma\in(0,1)$ and $V:\mathbb{R}^n\to\mathbb{R}$ be an arbitrary bounded function. Consider the Bellman operator ${B_\gamma}[V]$ defined as, Then, we have $\|{B_\gamma}[{V_\gamma^1}] - {B_\gamma}[{V_\gamma^2}]\|_\infty \le \gamma \| {V_\gamma^1}-{V_\gamma^2} \|_\infty$, for all bounded functions ${V_\gamma^1}$ and ${V_\gamma^2}$, and ${V_\gamma(x)}$ in eq:inf_horizon_reach_avoid_problem is

Figures (8)

Figure 1: Applying our reachability analysis framework to drone racing. In (a), hardware experiments demonstrate that our learned control policy enables an ego drone to safely overtake another drone, despite unpredictable disturbances in the other drone's acceleration. In (b), we illustrate the concept of the propeller induced airflow flem2024experimental, which can affect other drones' flight. In (c), we apply our learned control policy in a simulation with randomly sampled disturbances. In (d), we project the learned reach-avoid value function onto the $(x,y)$ position of the ego drone. The super-zero level set, outlined by dashed curves, indicates our learned reach-avoid (RA) set. In (e), we plot the certified RA sets using Lipschitz and second-order cone programming certification.
Figure 2: Comparing ${V_\gamma(x)}$ with ${\bar{V}}(x)$ from \ref{['eq:define classical RA value']} and $V(x)$, a constructed solution to the Bellman equation of ${\bar{V}}(x)$ in prior works fisac2019bridginghsusafetyhsu2023safetyhsu2023simhsu2023isaacsnguyen2024gameplayli2023learning. Consider a 1-dimensional dynamics: $x_{t+1} = 1.01x_t + 0.01(u_t + d_t)$, with $|u_t| \le 1$ and $|d_t| \le 0.5$. We associate ${\mathcal{T}} = \{x : x < -1\}$ and ${\mathcal{C}} = \{x : x > -2\}$ with bounded, Lipschitz continuous functions $r(x) = \max(\min( -(x + 1), 10), -10)$ and $c(x) = \max(\min(x + 2, 10), -10)$, respectively. For all $\gamma\in(0,1)$, our super-zero level set $\{x:{V_\gamma(x)}>0\}$ equals the RA set ${\mathcal{R}}=\{x:-2<x<0.5\}$. By Theorem \ref{['lemma:continuity']}, ${V_\gamma(x)}$ is Lipschitz continuous if $\gamma\in(0, 0.99009)$. The super-zero level set of ${\bar{V}}(x)$ also recovers ${\mathcal{R}}$, but ${\bar{V}}(x)$ is discontinuous at $x=0.5$ because the control fails to drive the state to ${\mathcal{T}}$ under the worst-case disturbance when $x_t\ge0.5$. Finally, in the third subfigure, we show that the Bellman equation in prior works fisac2019bridginghsusafetyhsu2023safetyhsu2023simhsu2023isaacsnguyen2024gameplayli2023learning has non-unique solutions, whose super-zero level set may not equal ${\mathcal{R}}$.
Figure 3: We sampled 50 initial states from the SOCP certified set shown in Figure \ref{['fig:front']}. A few crashes occurred due to insufficient battery charge or Vicon sensor failures caused by natural light. These instances were excluded as outliers. With a fully charged battery and no Vicon system failures, the ego drone successfully overtook the other drone from each of the 50 initial states, despite the latter’s uncertain acceleration. We visualize two hardware experiments in the above subfigures. The remaining 9-dimensional initial state includes $[v_{x,t}^1,v_{y,t}^1,v_{z,t}^1, p_{x,t}^2,p_{y,t}^2,p_{z,t}^2, v_{x,t}^2, v_{y,t}^2, v_{z,t}^2]=[0,0.7, 0,0.4,-2.2,0,0,0.3,0]$.
Figure 4: Highway reachability analysis: In (a), we simulate the nonlinear dynamics with the learned policy ${\pi_{\theta_u}}$ and randomly sampled disturbances on other vehicles' acceleration. The 10-dimensional state space includes $[p_{x,t}^1, p_{y,t}^1,v_{t}^1,\theta_t^1, p_{x,t}^2,p_{y,t}^2, v_{y,t}^2, p_{x,t}^3,p_{y,t}^3, v_{y,t}^3]$. The $p_y$-axis movement of the red and green agents is modeled using double integrator dynamics, while their initial $p_x$ positions are sampled randomly and remain stationary during simulation. In (b), we project our learned value function, with $\gamma=0.95$, onto the $(x,y)$ position of the ego vehicle. In (c), we plot the RA set learned using the state-of-the-art method li2023learningnguyen2024gameplay with $\gamma=0.95$. As suggested in hsusafety, annealing $\gamma\to 1$ is necessary for prior works; otherwise, the learned RA sets in prior works are conservative. In (d), we plot our certified RA sets.
Figure 5: Histogram of the time required for computing ${\check{V}^L_{\gamma}}(x,T)$ and ${\check{V}^{S}_{\gamma}}(x,T)$ for each of the 10,000 randomly sampled states $x$. The certification horizons for drone racing and highway are $T=15$ and $T=30$, respectively.
...and 3 more figures

Theorems & Definitions (8)

Theorem 1: Contraction mapping
proof
Theorem 2: Lipschitz continuity
proof
Theorem 3: Fast reaching
proof
Remark 1
Remark 2

Certifiable Reachability Learning Using a New Lipschitz Continuous Value Function

TL;DR

Abstract

Certifiable Reachability Learning Using a New Lipschitz Continuous Value Function

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)