Table of Contents
Fetching ...

Verifying rich robustness properties for neural networks

Mohammad Afzal, S. Akshay, Ashutosh Gupta

TL;DR

The paper tackles verifying rich robustness properties for neural networks by introducing a simple yet expressive grammar for post-conditions and a layer-based encoding that appends small circuitry to the network, enabling the use of existing verifiers like $\alpha\beta$-CROWN. It supports confidence-based variants (relaxed, strong, smoothness) and non-confidence-based notions (top-$k$) by approximating the softmax and encoding the resulting predicates as Boolean combinations of linear constraints, with formal soundness and error-bounds guarantees. The authors demonstrate scalability on 8,870 benchmarks, including large networks up to about $13.16$M activations, and show that the layer-based encoding outperforms direct constraint encodings in Marabou. This framework provides a flexible, modular approach to robustness verification suitable for safety-critical applications, combining grammar-based specification with hardware-friendly neural encodings. The practical impact lies in enabling broader verification coverage without bespoke tool modifications while maintaining provable guarantees on approximation error.

Abstract

Robustness is a important problem in AI alignment and safety, with models such as neural networks being increasingly used in safety-critical systems. In the last decade, a large body of work has emerged on local robustness, i.e., checking if the decision of a neural network remains unchanged when the input is slightly perturbed. However, many of these approaches require specialized encoding and often ignore the confidence of a neural network on its output. In this paper, our goal is to build a generalized framework to specify and verify variants of robustness in neural network verification. We propose a specification framework using a simple grammar, which is flexible enough to capture most existing variants. This allows us to introduce new variants of robustness that take into account the confidence of the neural network in its outputs. Next, we develop a novel and powerful unified technique to verify all such variants in a homogeneous way, viz., by adding a few additional layers to the neural network. This enables us to use any state-of-the-art neural network verification tool, without having to tinker with the encoding within, while incurring an approximation error that we show is bounded. We perform an extensive experimental evaluation over a large suite of 8870 benchmarks having 138M parameters in a largest network, and show that we are able to capture a wide set of robustness variants and outperform direct encoding approaches by a significant margin.

Verifying rich robustness properties for neural networks

TL;DR

The paper tackles verifying rich robustness properties for neural networks by introducing a simple yet expressive grammar for post-conditions and a layer-based encoding that appends small circuitry to the network, enabling the use of existing verifiers like -CROWN. It supports confidence-based variants (relaxed, strong, smoothness) and non-confidence-based notions (top-) by approximating the softmax and encoding the resulting predicates as Boolean combinations of linear constraints, with formal soundness and error-bounds guarantees. The authors demonstrate scalability on 8,870 benchmarks, including large networks up to about M activations, and show that the layer-based encoding outperforms direct constraint encodings in Marabou. This framework provides a flexible, modular approach to robustness verification suitable for safety-critical applications, combining grammar-based specification with hardware-friendly neural encodings. The practical impact lies in enabling broader verification coverage without bespoke tool modifications while maintaining provable guarantees on approximation error.

Abstract

Robustness is a important problem in AI alignment and safety, with models such as neural networks being increasingly used in safety-critical systems. In the last decade, a large body of work has emerged on local robustness, i.e., checking if the decision of a neural network remains unchanged when the input is slightly perturbed. However, many of these approaches require specialized encoding and often ignore the confidence of a neural network on its output. In this paper, our goal is to build a generalized framework to specify and verify variants of robustness in neural network verification. We propose a specification framework using a simple grammar, which is flexible enough to capture most existing variants. This allows us to introduce new variants of robustness that take into account the confidence of the neural network in its outputs. Next, we develop a novel and powerful unified technique to verify all such variants in a homogeneous way, viz., by adding a few additional layers to the neural network. This enables us to use any state-of-the-art neural network verification tool, without having to tinker with the encoding within, while incurring an approximation error that we show is bounded. We perform an extensive experimental evaluation over a large suite of 8870 benchmarks having 138M parameters in a largest network, and show that we are able to capture a wide set of robustness variants and outperform direct encoding approaches by a significant margin.

Paper Structure

This paper contains 31 sections, 15 theorems, 32 equations, 17 figures, 1 table.

Key Result

theorem thmcountertheorem

Given verification query $\langle N,P,Q_{rel}' \rangle$,

Figures (17)

  • Figure 1: Relaxed Robustness: (a-b) The network convBigRELU-PGD.onnx correctly classified the original image (left) as horse (and resp. airplane) with high confidence. With an input perturbation of $16/255$, we can find misclassified images (right) but with low confidence. In fact, it turns out all counterexamples have low confidence and hence verification succeeds under the relaxed robustness criterion with an $80\%$ confidence threshold, while state-of-the-art verifiers would have declared this network non-robust. Strong robustness: (c-e) The network convBigRELU-PGD.onnx classified the original image (left) of class ship/horse/deer with very high confidence and we found images (right) within perturbation of $16/255$, such that the confidence drops drastically, although the class remains same. These images are robust with respect to the standard and relaxed robust criteria but not robust with respect to the strong robust criteria if confidence is allowed to fall upto $30\%$. Smoothness: (f-g) The left image, labeled as Airplan/Truck, is taken from the Cifar-10 dataset and is classified correctly with a medium confidence of $\sim 50\%$ by the neural network cifar10-2-255.onnx. Under an $\epsilon = 16/255$ perturbation, we obtain images with much higher and lower confidences, showing drastic variations. The bottomline is that the above requirements may vary across applications, and users can define many more requirements tailored to specific needs.
  • Figure 2: Behavior of lower bound $\tau_{lb}$ and the user defined threshold $\tau$ approximations of $softmaxC$.
  • Figure 3: (a) Neural network $N$ appended with neural network layer that encodes either $\bigwedge_{i=0}^n LE_i \leq 0$ and its negation $\bigvee_{i=0}^n LE_i > 0$, where $LE_i = \sum_{j=1}^{m} c_{ij}y_j+b_i$. The circular nodes are ReLU and the square nodes nodes are linear combinations. (b) The circuit for $\mathfrak{C}_{V(\dagger,Q,\eta)}$ (c) Translation of post-condition $\lnot ((y_1 + y_2 \leq 0 \land y_2 \leq 0 ) \lor (y_1 - y_3 \leq 0 \land y_3 \leq 2 ))$ using our scheme and $\eta = 0.2$.
  • Figure 4: Figures \ref{['plot:relaxed1']}, \ref{['plot:strong1']}, and \ref{['plot:smooth1']} show the confidence thresholds on the x-axis and the percentage of safe, unsafe, and timeout instances on the y-axis. Figure \ref{['plot:bargraph']} presents a comparison between standard robustness and top-$k$ robustness, including top-$k$ relaxed robustness and top-$k$ affinity robustness. For each robustness metric, the left/middle/right bars represent the percentage of unsafe, safe, and timeout cases, respectively.
  • Figure 5: (A-B)Comparison of the constraint-based solver marabou with and without the simplified property, alongside $\alpha\beta$-CROWN with the simplified property. (C) the $x$-axis shows the number of benchmarks solved, ordered by increasing solving time, and the $y$-axis shows the time taken to solve them.
  • ...and 12 more figures

Theorems & Definitions (38)

  • Claim 1
  • proof
  • Claim 2
  • proof
  • theorem thmcountertheorem
  • proof
  • Claim 3
  • theorem thmcountertheorem
  • proof
  • Claim 4
  • ...and 28 more