Verifying rich robustness properties for neural networks
Mohammad Afzal, S. Akshay, Ashutosh Gupta
TL;DR
The paper tackles verifying rich robustness properties for neural networks by introducing a simple yet expressive grammar for post-conditions and a layer-based encoding that appends small circuitry to the network, enabling the use of existing verifiers like $\alpha\beta$-CROWN. It supports confidence-based variants (relaxed, strong, smoothness) and non-confidence-based notions (top-$k$) by approximating the softmax and encoding the resulting predicates as Boolean combinations of linear constraints, with formal soundness and error-bounds guarantees. The authors demonstrate scalability on 8,870 benchmarks, including large networks up to about $13.16$M activations, and show that the layer-based encoding outperforms direct constraint encodings in Marabou. This framework provides a flexible, modular approach to robustness verification suitable for safety-critical applications, combining grammar-based specification with hardware-friendly neural encodings. The practical impact lies in enabling broader verification coverage without bespoke tool modifications while maintaining provable guarantees on approximation error.
Abstract
Robustness is a important problem in AI alignment and safety, with models such as neural networks being increasingly used in safety-critical systems. In the last decade, a large body of work has emerged on local robustness, i.e., checking if the decision of a neural network remains unchanged when the input is slightly perturbed. However, many of these approaches require specialized encoding and often ignore the confidence of a neural network on its output. In this paper, our goal is to build a generalized framework to specify and verify variants of robustness in neural network verification. We propose a specification framework using a simple grammar, which is flexible enough to capture most existing variants. This allows us to introduce new variants of robustness that take into account the confidence of the neural network in its outputs. Next, we develop a novel and powerful unified technique to verify all such variants in a homogeneous way, viz., by adding a few additional layers to the neural network. This enables us to use any state-of-the-art neural network verification tool, without having to tinker with the encoding within, while incurring an approximation error that we show is bounded. We perform an extensive experimental evaluation over a large suite of 8870 benchmarks having 138M parameters in a largest network, and show that we are able to capture a wide set of robustness variants and outperform direct encoding approaches by a significant margin.
