Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Fernando Castañeda; Jason J. Choi; Wonsuhk Jung; Bike Zhang; Claire J. Tomlin; Koushil Sreenath

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Fernando Castañeda, Jason J. Choi, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

TL;DR

A framework that combines model-based safety methods with data-driven techniques to guarantee safety recursively for systems with uncertain dynamics is presented and forward invariance of the safe set with high probability is established, even in previously unexplored regions.

Abstract

Learning-based control has recently shown great efficacy in performing complex tasks for various applications. However, to deploy it in real systems, it is of vital importance to guarantee the system will stay safe. Control Barrier Functions (CBFs) offer mathematical tools for designing safety-preserving controllers for systems with known dynamics. In this article, we first introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers using Gaussian Process (GP) regression to close the gap between an approximate mathematical model and the real system, which results in a second-order cone program (SOCP)-based control design. We then present the pointwise feasibility conditions of the resulting safety controller, highlighting the level of richness that the available system information must meet to ensure safety. We use these conditions to devise an event-triggered online data collection strategy that ensures the recursive feasibility of the learned safety controller. Our method works by constantly reasoning about whether the current information is sufficient to ensure safety or if new measurements under active safe exploration are required to reduce the uncertainty. As a result, our proposed framework can guarantee the forward invariance of the safe set defined by the CBF with high probability, even if it contains a priori unexplored regions. We validate the proposed framework in two numerical simulation experiments.

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

TL;DR

Abstract

Paper Structure (27 sections, 13 theorems, 52 equations, 5 figures, 1 algorithm)

This paper contains 27 sections, 13 theorems, 52 equations, 5 figures, 1 algorithm.

Introduction
Motivation & Key Idea
Related Work
Contributions
Notations
Problem Statement
Safety with Perfectly Known Dynamics
Safety under Model Uncertainty
Gaussian Process Regression-based Probabilistic Safety Filter
Gaussian Process Regression for Learning $\Delta_B(x, u)$
Probabilistic Safety Filter (GP-CBF-SOCP)
Analysis of Pointwise Feasibility
Probabilistic Safe Online Learning and Recursive Feasibility
Proposed Safe Online Learning Strategy
Theoretical Analysis
...and 12 more sections

Key Result

Lemma 1

ames2017cbf Let system eq:system admit a CBF $B: \mathcal{X} \to \mathbb{R}$. Let $\mathcal{X}_{\text{safe}}= \{ x \in \mathcal{X}: B(x) \geq 0 \}$ be its associated safe set, with boundary $\partial \mathcal{X}_{\text{safe}}= \{ x \in \mathcal{X}: B(x) = 0 \}$. If for all $x \in \partial \mathcal{X renders the set $\mathcal{X}_{\text{safe}}$ forward invariant.

Figures (5)

Figure 1: Color map of $\lambda_\dagger$ in the state-space of the adaptive cruise control system $x\!=\![v, z]^T$ when running Algorithm \ref{['algo:safelearning']} with no prior data. The region in which $\lambda_\dagger<0$ is expanded online as Algorithm \ref{['algo:safelearning']} collects new measurements. Top: snapshot when $\lambda_\dagger$ hits the threshold $-\varepsilon$, Algorithm \ref{['algo:safelearning']} collects a measurement along $u_{\text{safe}}$ which expands the region where $\lambda_\dagger<0$ (in blue). Bottom: result at the end of the trajectory.
Figure 2: Simulation results of an adaptive cruise control system under model uncertainty, when controlled using different strategies: Algorithm \ref{['algo:safelearning']} with no prior data (green); Algorithm \ref{['algo:safelearning']} with a prior dataset (yellow); the GP-CBF-SOCP with no prior data using time-triggered updates online (orange); the CBF-QP using the uncertain dynamics (pink); and the oracle true-plant-based CBF-QP (black). Even when no prior data is available, Algorithm \ref{['algo:safelearning']} keeps the system safe ($B>0$) by collecting measurements in the safety direction (negative $u$) when $\lambda_\dagger$ approaches $0$. This results in the ego car checking the brakes to reduce the uncertainty (negative spikes in the top plot). Using either the GP-CBF-SOCP with just time-triggered data collection, or the nominal model-based CBF-QP, the system becomes unsafe, as shown in the $B$ plot.
Figure 3: Illustration of the zero-level set of the CBF for the kinematic vehicle example. $D_m$ is the safety distance, which is computed by adding the minimum distance for the vehicle to steer with a maximal yaw rate without colliding with the obstacle $d_{steer}$ and a velocity-dependent distance margin $\tau (v - \underbar{$v$})$.
Figure 4: Snapshots that show the chronological evolution of a 4-dimensional kinematic vehicle system under model uncertainty, when controlled using different methods: Algorithm \ref{['algo:safelearning']} with no prior data (top row, blue); the CBF-QP based on the nominal model (bottom row, pink). Starting at the initial state $x_0$ (orange diamond), the vehicle pursues the target (yellow star), while not colliding with the obstacle (grey circle). The curved line indicates the trajectory of the vehicle's position that terminates with its position at the time when a snapshot is taken (green or red circle). Note that the circle is colored red when the vehicle violates the safety constraint (i.e., $B(x) < 0$). The blue square positioned along the trajectory highlights the time stamps at which Algorithm \ref{['algo:safelearning']} collects the data in event-triggered manner. Finally, the filled circle with a dotted border represents the zero-sublevel set of CBF. To watch the full video of the vehicle running under each control algorithm, please visit https://www.youtube.com/watch?v=HM_VB_mGgeA.
Figure 5: Simulation results of 4-dimensional kinematic vehicle system under model uncertainty, when using two strategies introduced in Figure \ref{['fig:car4d_snapshots']} with the identical color notation. The four plots illustrate the yaw rate, the acceleration control inputs, the CBF values, and $\lambda_\dagger$ in time respectively. The dotted lines denote the input bounds, the zero-level of the CBF $B(x)=0$; and the threshold $-\epsilon$ in Algorithm \ref{['algo:safelearning']}. The red bars in the third plot represent the time stamps when the nominal CBF-QP violates safety. In contrast, Algorithm \ref{['algo:safelearning']} ensures $B(x) > 0$ at all times. The red cross points in the last plot indicate the time stamps when $\lambda_\dagger$ hits $-\epsilon$ and the safe exploration is executed according to Algorithm \ref{['algo:safelearning']}.

Theorems & Definitions (34)

Definition 1: Safety as forward invariance
Remark 1
Definition 2: Control Barrier Function ames2017cbf
Lemma 1
Remark 2
Definition 3: Affine Dot Product compound kernel GPCLFSOCP
Lemma 2
Theorem 1
Lemma 3
proof
...and 24 more

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

TL;DR

Abstract

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (34)