Table of Contents
Fetching ...

Measuring Neural Net Robustness with Constraints

Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, Antonio Criminisi

TL;DR

This work tackles the lack of objective robustness measures for neural networks by defining pointwise robustness and two complementary statistics—adversarial frequency and adversarial severity—both parameterized by a threshold ε. It develops a tractable framework by encoding network behavior as linear constraints and restricting the search to convex regions where the network is linear, allowing an LP-based approximation of the nearest adversarial example. Empirical results on MNIST and CIFAR-10 show that the proposed LP-based estimator yields more reliable robustness assessments than prior approaches and that robustness improvements via data augmentation may overfit to specific adversarial algorithms. The study demonstrates both the practical feasibility of measuring robustness at scale and the nuanced behavior of robustness across architectures, highlighting the challenge of significantly boosting resilience in large networks while providing a roadmap for more robust evaluation and training workflows.

Abstract

Despite having high accuracy, neural nets have been shown to be susceptible to adversarial examples, where a small perturbation to an input can cause it to become mislabeled. We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. Our algorithm generates more informative estimates of robustness metrics compared to estimates based on existing algorithms. Furthermore, we show how existing approaches to improving robustness "overfit" to adversarial examples generated using a specific algorithm. Finally, we show that our techniques can be used to additionally improve neural net robustness both according to the metrics that we propose, but also according to previously proposed metrics.

Measuring Neural Net Robustness with Constraints

TL;DR

This work tackles the lack of objective robustness measures for neural networks by defining pointwise robustness and two complementary statistics—adversarial frequency and adversarial severity—both parameterized by a threshold ε. It develops a tractable framework by encoding network behavior as linear constraints and restricting the search to convex regions where the network is linear, allowing an LP-based approximation of the nearest adversarial example. Empirical results on MNIST and CIFAR-10 show that the proposed LP-based estimator yields more reliable robustness assessments than prior approaches and that robustness improvements via data augmentation may overfit to specific adversarial algorithms. The study demonstrates both the practical feasibility of measuring robustness at scale and the nuanced behavior of robustness across architectures, highlighting the challenge of significantly boosting resilience in large networks while providing a roadmap for more robust evaluation and training workflows.

Abstract

Despite having high accuracy, neural nets have been shown to be susceptible to adversarial examples, where a small perturbation to an input can cause it to become mislabeled. We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. Our algorithm generates more informative estimates of robustness metrics compared to estimates based on existing algorithms. Furthermore, we show how existing approaches to improving robustness "overfit" to adversarial examples generated using a specific algorithm. Finally, we show that our techniques can be used to additionally improve neural net robustness both according to the metrics that we propose, but also according to previously proposed metrics.

Paper Structure

This paper contains 28 sections, 1 theorem, 9 equations, 3 figures, 1 table.

Key Result

Theorem 1

For any $\mathbf{x}\in\mathcal{X}$ and $\ell\in\mathcal{L}$, we have $f(\mathbf{x})=\ell$ if and only if $\mathcal{C}_f(\mathbf{x},\ell)$ is satisfiable.

Figures (3)

  • Figure 1: Neural net with a single hidden layer and ReLU activations trained on dataset with binary labels. (a) The training data and loss surface. (b) The linear region corresponding to the red training point.
  • Figure 2: For MNIST, (a) an image classified 1, (b) its adversarial example classifed 3, and (c) the (scaled) adversarial perturbation. For CIFAR-10, (d) an image classified as "automobile", (e) its adversarial example classified as "truck", and (f) the (scaled) adversarial perturbation.
  • Figure 3: The cumulative number of test points $\mathbf{x}_*$ such that $\rho(f,\mathbf{x}_*)\le\epsilon$ as a function of $\epsilon$. In (a) and (b), the neural nets are the original LeNet (black), LeNet fine-tuned with the baseline and $T=2$ (red), and LeNet fine-tuned with our algorithm and $T=2$ (blue); in (a), $\hat{\rho}$ is measured using the baseline, and in (b), $\hat{\rho}$ is measured using our algorithm. In (c), the neural nets are the original NiN (black) and NiN finetuned with our algorithm, and $\hat{\rho}$ is estimated using our algorithm.

Theorems & Definitions (1)

  • Theorem 1