Synthesizing Efficient and Permissive Programmatic Runtime Shields for Neural Policies

Jieke Shi; Junda He; Zhou Yang; Đorđe Žikelić; David Lo

Synthesizing Efficient and Permissive Programmatic Runtime Shields for Neural Policies

Jieke Shi, Junda He, Zhou Yang, Đorđe Žikelić, David Lo

TL;DR

Safety of neural policies in control systems is critical but challenging due to unsafe outputs. Aegis addresses this by formulating shield synthesis as sketch-based program synthesis, combining CEGIS and Bayesian optimization to produce lightweight linear policies and switching thresholds that guarantee safety. Across eight representative control systems, Aegis achieves comparable safety to VRL but with 2.2× less time overhead, 3.9× less memory, and 1.5× fewer interventions on average, demonstrating superior efficiency and permissiveness. The approach scales to larger policies, relies on inferred linearized environment dynamics, and provides a practical replication package for reproducibility and further research.

Abstract

With the increasing use of neural policies in control systems, ensuring their safety and reliability has become a critical software engineering task. One prevalent approach to ensuring the safety of neural policies is to deploy programmatic runtime shields alongside them to correct their unsafe commands. However, the programmatic runtime shields synthesized by existing methods are either computationally expensive or insufficiently permissive, resulting in high overhead and unnecessary interventions on the system. To address these challenges, we propose Aegis, a novel framework that synthesizes lightweight and permissive programmatic runtime shields for neural policies. Aegis achieves this by formulating the seeking of a runtime shield as a sketch-based program synthesis problem and proposing a novel method that leverages counterexample-guided inductive synthesis and Bayesian optimization to solve it. To evaluate Aegis and its synthesized shields, we use eight representative control systems and compare Aegis with the current state-of-the-art. Our results show that the programmatic runtime shields synthesized by Aegis can correct all unsafe commands from neural policies, ensuring that the systems do not violate any desired safety properties at all times. Compared to the current state-of-the-art, Aegis's shields exhibit a 2.2$\times$ reduction in time overhead and a 3.9$\times$ reduction in memory usage, suggesting that they are much more lightweight. Moreover, Aegis's shields incur an average of 1.5$\times$ fewer interventions than other shields, showing better permissiveness.

Synthesizing Efficient and Permissive Programmatic Runtime Shields for Neural Policies

TL;DR

Abstract

reduction in time overhead and a 3.9

reduction in memory usage, suggesting that they are much more lightweight. Moreover, Aegis's shields incur an average of 1.5

fewer interventions than other shields, showing better permissiveness.

Paper Structure (42 sections, 30 equations, 16 figures, 7 tables, 3 algorithms)

This paper contains 42 sections, 30 equations, 16 figures, 7 tables, 3 algorithms.

Introduction
Programmatic Runtime Shields
State-of-the-art and Its Limitations
Our Solution: Aegis
Evaluating Aegis
Novelty and Contributions
Paper Structure
Preliminary
Neural Policy
Programmatic Runtime Shields and VRL
Aegis
Overview
An Illustrative Example
Program Sketch
Inferring Environment Dynamics
...and 27 more sections

Figures (16)

Figure 1: Overview of how a programmatic runtime shield works. The green line detects potentially unsafe commands from the neural policy, while the red lines correct them by returning safe ones instead.
Figure 2: Main control loop in a quadcopter (simplified for readability). The green and red colors highlight the neural policy and runtime shield, respectively.
Figure 3: A runtime shield synthesized by VRL for the neural policy in a quadcopter system. Line 3 (marked in green) is a barrier certificate that identifies unsafe states, while a linear policy in line 4 (marked in red) outputs safe commands.
Figure 4: A runtime shield synthesized by Aegis for the neural policy in a quadcopter system. Line 2 (marked in red) is a linear policy that outputs safe commands, while line 3 (marked in green) identifies potentially unsafe commands.
Figure 5: The workflow of Aegis.
...and 11 more figures

Synthesizing Efficient and Permissive Programmatic Runtime Shields for Neural Policies

TL;DR

Abstract

Synthesizing Efficient and Permissive Programmatic Runtime Shields for Neural Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (16)