From learning to safety: A Direct Data-Driven Framework for Constrained Control
Kanghui He, Shengling Shi, Ton van den Boom, Bart De Schutter
TL;DR
The paper tackles safety in learning-based control with unknown dynamics by introducing a direct data-driven framework using state-action control barrier functions (SACBFs) and a direct safety filter (3DSF) that operate without system identification. It develops three learning strategies to synthesize SACBFs (RL, expert-guided, and supervised learning from CBFs) and provides an error-to-state safety (ESSf) analysis that links learning error to necessary state constraint tightening and SACBF relaxation. A refinement step via constrained FQI enables near-optimal performance while preserving safety, and a vehicle-case study demonstrates improved constraint satisfaction and task achievement over model-based safety filters and reward shaping. The work demonstrates a practical, online-capable approach to combine model-free learning with formal safety guarantees, albeit with acknowledged limitations in sample complexity and RL-specific safety guarantees. Overall, the framework offers a versatile, data-driven path to safe learning-based control in constrained environments, with clear avenues for future extension to continuous-time systems and reduced-sample settings.
Abstract
Ensuring safety in the sense of constraint satisfaction for learning-based control is a critical challenge, especially in the model-free case. While safety filters address this challenge in the model-based setting by modifying unsafe control inputs, they typically rely on predictive models derived from physics or data. This reliance limits their applicability for advanced model-free learning control methods. To address this gap, we propose a new optimization-based control framework that determines safe control inputs directly from data. The benefit of the framework is that it can be updated through arbitrary model-free learning algorithms to pursue optimal performance. As a key component, the concept of direct data-driven safety filters (3DSF) is first proposed. The framework employs a novel safety certificate, called the state-action control barrier function (SACBF). We present three different schemes to learn the SACBF. Furthermore, based on input-to-state safety analysis, we present the error-to-state safety analysis framework, which provides formal guarantees on safety and recursive feasibility even in the presence of learning inaccuracies. The proposed control framework bridges the gap between model-free learning-based control and constrained control, by decoupling performance optimization from safety enforcement. Simulations on vehicle control illustrate the superior performance regarding constraint satisfaction and task achievement compared to model-based methods and reward shaping.
