Table of Contents
Fetching ...

Neural Control and Certificate Repair via Runtime Monitoring

Emily Yu, Đorđe Žikelić, Thomas A. Henzinger

TL;DR

This paper tackles ensuring safety and stability for neural policy controllers in unknown dynamic environments by jointly learning a policy and a certificate (e.g., barrier or Lyapunov) and then guaranteeing reliability through runtime monitoring. It introduces two novel monitors, CertPM and PredPM, that monitor both the policy and the certificate in a black-box setting, flag violations, and collect counterexamples to repair the policy and certificate via a data-driven loop. Empirical results on DroneEnv and ShipEnv show that monitoring-assisted repair significantly improves safety metrics and certificate validity, withPredPM offering predictive warnings and CertPM excelling in policy repair. The work provides a practical, data-efficient path to safer deployment of learning-based controllers when dynamics are not fully known, and outlines extensions to Lyapunov-based monitoring and future work in stochastic and multi-agent settings.

Abstract

Learning-based methods provide a promising approach to solving highly non-linear control tasks that are often challenging for classical control methods. To ensure the satisfaction of a safety property, learning-based methods jointly learn a control policy together with a certificate function for the property. Popular examples include barrier functions for safety and Lyapunov functions for asymptotic stability. While there has been significant progress on learning-based control with certificate functions in the white-box setting, where the correctness of the certificate function can be formally verified, there has been little work on ensuring their reliability in the black-box setting where the system dynamics are unknown. In this work, we consider the problems of certifying and repairing neural network control policies and certificate functions in the black-box setting. We propose a novel framework that utilizes runtime monitoring to detect system behaviors that violate the property of interest under some initially trained neural network policy and certificate. These violating behaviors are used to extract new training data, that is used to re-train the neural network policy and the certificate function and to ultimately repair them. We demonstrate the effectiveness of our approach empirically by using it to repair and to boost the safety rate of neural network policies learned by a state-of-the-art method for learning-based control on two autonomous system control tasks.

Neural Control and Certificate Repair via Runtime Monitoring

TL;DR

This paper tackles ensuring safety and stability for neural policy controllers in unknown dynamic environments by jointly learning a policy and a certificate (e.g., barrier or Lyapunov) and then guaranteeing reliability through runtime monitoring. It introduces two novel monitors, CertPM and PredPM, that monitor both the policy and the certificate in a black-box setting, flag violations, and collect counterexamples to repair the policy and certificate via a data-driven loop. Empirical results on DroneEnv and ShipEnv show that monitoring-assisted repair significantly improves safety metrics and certificate validity, withPredPM offering predictive warnings and CertPM excelling in policy repair. The work provides a practical, data-efficient path to safer deployment of learning-based controllers when dynamics are not fully known, and outlines extensions to Lyapunov-based monitoring and future work in stochastic and multi-agent settings.

Abstract

Learning-based methods provide a promising approach to solving highly non-linear control tasks that are often challenging for classical control methods. To ensure the satisfaction of a safety property, learning-based methods jointly learn a control policy together with a certificate function for the property. Popular examples include barrier functions for safety and Lyapunov functions for asymptotic stability. While there has been significant progress on learning-based control with certificate functions in the white-box setting, where the correctness of the certificate function can be formally verified, there has been little work on ensuring their reliability in the black-box setting where the system dynamics are unknown. In this work, we consider the problems of certifying and repairing neural network control policies and certificate functions in the black-box setting. We propose a novel framework that utilizes runtime monitoring to detect system behaviors that violate the property of interest under some initially trained neural network policy and certificate. These violating behaviors are used to extract new training data, that is used to re-train the neural network policy and the certificate function and to ultimately repair them. We demonstrate the effectiveness of our approach empirically by using it to repair and to boost the safety rate of neural network policies learned by a state-of-the-art method for learning-based control on two autonomous system control tasks.

Paper Structure

This paper contains 10 sections, 2 theorems, 8 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

Suppose that there exists a continuously differentiable function ${\mathcal{B}}: \mathcal{X}\rightarrow \mathbb{R}$ for the dynamical system $\Sigma$ under a policy $\pi$ with respect to the unsafe set $\mathcal{X}_u$, that satisfies the following conditions: Then, $\Sigma$ satisfies the safety property under $\pi$ with respect to $\mathcal{X}_u$, and we call ${\mathcal{B}}$ a barrier function.

Figures (8)

  • Figure 1: The monitor-learner framework.
  • Figure 1: Monitoring-based neural network policy and certificate repair for safety properties.
  • Figure 2: The change in the number of certificate violations for the Safety condition and the Non-decreasing conditions of barrier functions, for the ship benchmark after one round of repair and after a second round of repair. The first round monitors $D = 15000$ system executions. The second round monitors an additional $D=20000$ executions. For better readability, we plot only the executions (out of 50) for which at least one certificate violation is detected.
  • Figure 2: Monitoring-based neural network policy and Lyapunov function repair for stability properties.
  • Figure 3: Estimates $v_U$, $v_S$, and $v_N$ for two different systems executions computed by the PredPM for the drone and the ship benchmarks.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Proposition 1: Barrier functions
  • Proposition 2: Lyapunov functions