GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial Attacks

Nicholas Rober; Alex Rose; Jonathan P. How

GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial Attacks

Nicholas Rober, Alex Rose, Jonathan P. How

TL;DR

GUARDIAN addresses safety for systems relying on NN-based perception under adversarial observation perturbations by combining NN verification bounds with a modified Hamilton-Jacobi reachability safety filter. It bounds the true state via $\bar{\mathcal{X}}_t$ using NN verification under attack strength $\epsilon$ and uncertainty $e_{\hat{x}}$, then computes a safe input by evaluating $\Phi(\bar{\mathcal{X}}_t,u)$ against the HJ value function $V(\cdot)$. Theoretical results establish invariance of the safe set $\Omega^*$ under the GUARDIAN update under stated assumptions, and scalability analyses discuss polynomial-time NNV bounds vs exponential HJ; future work includes latent-space reachability to improve scalability. Numerical experiments on a Taxinet-like runway taxiing task, state-dependent attack vulnerability, and comparisons with MR-CBFs, R-CBFs, and R-CBF-QPs illustrate GUARDIAN’s superior safety performance and robustness to adversarial perturbations in perception modules.

Abstract

Safety filtering is an effective method for enforcing constraints in safety-critical systems, but existing methods typically assume perfect state information. This limitation is especially problematic for systems that rely on neural network (NN)-based state estimators, which can be highly sensitive to noise and adversarial input perturbations. We address these problems by introducing GUARDIAN: Guaranteed Uncertainty-Aware Reachability Defense against Adversarial INterference, a safety filtering framework that provides formal safety guarantees for systems with NN-based state estimators. At runtime, GUARDIAN uses neural network verification tools to provide guaranteed bounds on the system's state estimate given possible perturbations to its observation. It then uses a modified Hamilton-Jacobi reachability formulation to construct a safety filter that adjusts the nominal control input based on the verified state bounds and safety constraints. The result is an uncertainty-aware filter that ensures safety despite the system's reliance on an NN estimator with noisy, possibly adversarial, input observations. Theoretical analysis and numerical experiments demonstrate that GUARDIAN effectively defends systems against adversarial attacks that would otherwise lead to a violation of safety constraints.

GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial Attacks

TL;DR

using NN verification under attack strength

and uncertainty

, then computes a safe input by evaluating

against the HJ value function

. Theoretical results establish invariance of the safe set

under the GUARDIAN update under stated assumptions, and scalability analyses discuss polynomial-time NNV bounds vs exponential HJ; future work includes latent-space reachability to improve scalability. Numerical experiments on a Taxinet-like runway taxiing task, state-dependent attack vulnerability, and comparisons with MR-CBFs, R-CBFs, and R-CBF-QPs illustrate GUARDIAN’s superior safety performance and robustness to adversarial perturbations in perception modules.

Abstract

Paper Structure (14 sections, 3 theorems, 14 equations, 4 figures)

This paper contains 14 sections, 3 theorems, 14 equations, 4 figures.

Introduction
Preliminaries
System Dynamics
Adversarial Attacks
Neural Network Verification
Hamilton-Jacobi Reachability Analysis
GUARDIAN
Theoretical Results
Scalability
Numerical Results
Taxinet Safety
State-Dependent Vulnerability to Adversarial Attacks
Comparison to MR-CBFs, R-CBFs, and R-CBF-QPs
Conclusion

Key Result

Theorem 1

Given a neural network $\pi: \mathbb{R}^{n_z} \rightarrow \mathbb{R}^{n_o}$ and a hyper-rectangular set of possible inputs $\mathcal{Z}$, there exist two explicit functions such that the inequality $\underline{\pi}(z) \leq \pi(z) \leq \overline{\pi}(z)$ holds element-wise for all $z \in \mathcal{Z}$, with $\Psi, \Xi \in \mathbb{R}^{n_o \times n_z}$ and $\alpha, \beta \in \mathbb{R}^{n_o}$.

Figures (4)

Figure 1: GUARDIAN protects against adversarial attacks.
Figure 2: Trajectories of a taxiing aircraft subject to adversarially perturbed observations with and without GUARDIAN.
Figure 3: State error for sample trajectories with different landmark configurations show state-dependent vulnerability to adversarial attacks. Triangular landmarks (right) are more vulnerable to attack than square landmarks (left). Standard HJ reachability (top) does not ensure safety, but GUARDIAN (bottom) does by quantifying state uncertainty. Heatmaps show trends from FIM condition number (top) are captured by NNV set volume (bottom).
Figure 4: GUARDIAN outperforms MR-CBFs, R-CBFs, and R-CBF-QPs in defense against adversarial attacks.

Theorems & Definitions (4)

Theorem 1: NN Robustness Verification zhang_efficient_2018
Lemma 1
Theorem 2
proof

GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial Attacks

TL;DR

Abstract

GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial Attacks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (4)