Table of Contents
Fetching ...

From learning to safety: A Direct Data-Driven Framework for Constrained Control

Kanghui He, Shengling Shi, Ton van den Boom, Bart De Schutter

TL;DR

The paper tackles safety in learning-based control with unknown dynamics by introducing a direct data-driven framework using state-action control barrier functions (SACBFs) and a direct safety filter (3DSF) that operate without system identification. It develops three learning strategies to synthesize SACBFs (RL, expert-guided, and supervised learning from CBFs) and provides an error-to-state safety (ESSf) analysis that links learning error to necessary state constraint tightening and SACBF relaxation. A refinement step via constrained FQI enables near-optimal performance while preserving safety, and a vehicle-case study demonstrates improved constraint satisfaction and task achievement over model-based safety filters and reward shaping. The work demonstrates a practical, online-capable approach to combine model-free learning with formal safety guarantees, albeit with acknowledged limitations in sample complexity and RL-specific safety guarantees. Overall, the framework offers a versatile, data-driven path to safe learning-based control in constrained environments, with clear avenues for future extension to continuous-time systems and reduced-sample settings.

Abstract

Ensuring safety in the sense of constraint satisfaction for learning-based control is a critical challenge, especially in the model-free case. While safety filters address this challenge in the model-based setting by modifying unsafe control inputs, they typically rely on predictive models derived from physics or data. This reliance limits their applicability for advanced model-free learning control methods. To address this gap, we propose a new optimization-based control framework that determines safe control inputs directly from data. The benefit of the framework is that it can be updated through arbitrary model-free learning algorithms to pursue optimal performance. As a key component, the concept of direct data-driven safety filters (3DSF) is first proposed. The framework employs a novel safety certificate, called the state-action control barrier function (SACBF). We present three different schemes to learn the SACBF. Furthermore, based on input-to-state safety analysis, we present the error-to-state safety analysis framework, which provides formal guarantees on safety and recursive feasibility even in the presence of learning inaccuracies. The proposed control framework bridges the gap between model-free learning-based control and constrained control, by decoupling performance optimization from safety enforcement. Simulations on vehicle control illustrate the superior performance regarding constraint satisfaction and task achievement compared to model-based methods and reward shaping.

From learning to safety: A Direct Data-Driven Framework for Constrained Control

TL;DR

The paper tackles safety in learning-based control with unknown dynamics by introducing a direct data-driven framework using state-action control barrier functions (SACBFs) and a direct safety filter (3DSF) that operate without system identification. It develops three learning strategies to synthesize SACBFs (RL, expert-guided, and supervised learning from CBFs) and provides an error-to-state safety (ESSf) analysis that links learning error to necessary state constraint tightening and SACBF relaxation. A refinement step via constrained FQI enables near-optimal performance while preserving safety, and a vehicle-case study demonstrates improved constraint satisfaction and task achievement over model-based safety filters and reward shaping. The work demonstrates a practical, online-capable approach to combine model-free learning with formal safety guarantees, albeit with acknowledged limitations in sample complexity and RL-specific safety guarantees. Overall, the framework offers a versatile, data-driven path to safe learning-based control in constrained environments, with clear avenues for future extension to continuous-time systems and reduced-sample settings.

Abstract

Ensuring safety in the sense of constraint satisfaction for learning-based control is a critical challenge, especially in the model-free case. While safety filters address this challenge in the model-based setting by modifying unsafe control inputs, they typically rely on predictive models derived from physics or data. This reliance limits their applicability for advanced model-free learning control methods. To address this gap, we propose a new optimization-based control framework that determines safe control inputs directly from data. The benefit of the framework is that it can be updated through arbitrary model-free learning algorithms to pursue optimal performance. As a key component, the concept of direct data-driven safety filters (3DSF) is first proposed. The framework employs a novel safety certificate, called the state-action control barrier function (SACBF). We present three different schemes to learn the SACBF. Furthermore, based on input-to-state safety analysis, we present the error-to-state safety analysis framework, which provides formal guarantees on safety and recursive feasibility even in the presence of learning inaccuracies. The proposed control framework bridges the gap between model-free learning-based control and constrained control, by decoupling performance optimization from safety enforcement. Simulations on vehicle control illustrate the superior performance regarding constraint satisfaction and task achievement compared to model-based methods and reward shaping.

Paper Structure

This paper contains 22 sections, 6 theorems, 38 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Considering the policy $\pi$ in DDfilter, if $Q^B$ is an SACBF with the safe set $\mathcal{S}_Q$, $\pi$ will render system positively invariant in $\mathcal{S}_Q$. As a result, the trajectories of system, starting from $x_0 \in \mathcal{S}_Q$, controlled by $\pi$, satisfy $h(x_t) \leq 0$ and $u_t \i

Figures (6)

  • Figure 1: The comparison between the proposed 3DSF and the existing indirect data-driven counterparts.
  • Figure 2: The SACBFs and their corresponding safe sets in the $p_x$-$p_y$ plane with $v = 0.5$ and $\Psi =0$. The green area represents the obstacles and walls. In the contour figures, values below zero mean that the positions are feasible.
  • Figure 3: Illustration of the safe set learned from the expert controller under different $\beta$.
  • Figure 4: Comparison of closed-loop vehicle trajectories under different controllers and different safety filters. Black dots represent initial states. Green curve: the boundary of the safe set learned from the expert controller.
  • Figure 5: Closed-loop trajectories of MPC.
  • ...and 1 more figures

Theorems & Definitions (21)

  • Definition 1: Discrete-time control barrier function agrawal2017discrete
  • Definition 2: State-action control barrier function (SACBF)
  • Lemma 1: Safety of SACBFs
  • Remark 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Remark 2
  • Remark 3
  • ...and 11 more