Table of Contents
Fetching ...

Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach

Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, Luca Daniel

TL;DR

This work introduces CLEVER, an attack-agnostic robustness metric for neural networks based on local cross-Lipschitz constants and Extreme Value Theory. By bounding the minimum adversarial distortion through Lipschitz analysis and estimating the critical constants via reverse Weibull fits, CLEVER enables scalable robustness evaluation for large architectures like ResNet, Inception-v3, and MobileNet. Empirical results show CLEVER aligns with attack-driven distortions, increases for defended models, and remains computationally feasible, making it a practical safety checkpoint for unseen attacks. The approach extends robustness guarantees to non-differentiable ReLU networks and provides a rigorous framework for comparing model robustness beyond specific attack algorithms.

Abstract

The robustness of neural networks to adversarial examples has received great attention due to security implications. Despite various attack approaches to crafting visually imperceptible adversarial examples, little has been developed towards a comprehensive measure of robustness. In this paper, we provide a theoretical justification for converting robustness analysis into a local Lipschitz constant estimation problem, and propose to use the Extreme Value Theory for efficient evaluation. Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and computationally feasible for large neural networks. Experimental results on various networks, including ResNet, Inception-v3 and MobileNet, show that (i) CLEVER is aligned with the robustness indication measured by the $\ell_2$ and $\ell_\infty$ norms of adversarial examples from powerful attacks, and (ii) defended networks using defensive distillation or bounded ReLU indeed achieve better CLEVER scores. To the best of our knowledge, CLEVER is the first attack-independent robustness metric that can be applied to any neural network classifier.

Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach

TL;DR

This work introduces CLEVER, an attack-agnostic robustness metric for neural networks based on local cross-Lipschitz constants and Extreme Value Theory. By bounding the minimum adversarial distortion through Lipschitz analysis and estimating the critical constants via reverse Weibull fits, CLEVER enables scalable robustness evaluation for large architectures like ResNet, Inception-v3, and MobileNet. Empirical results show CLEVER aligns with attack-driven distortions, increases for defended models, and remains computationally feasible, making it a practical safety checkpoint for unseen attacks. The approach extends robustness guarantees to non-differentiable ReLU networks and provides a rigorous framework for comparing model robustness beyond specific attack algorithms.

Abstract

The robustness of neural networks to adversarial examples has received great attention due to security implications. Despite various attack approaches to crafting visually imperceptible adversarial examples, little has been developed towards a comprehensive measure of robustness. In this paper, we provide a theoretical justification for converting robustness analysis into a local Lipschitz constant estimation problem, and propose to use the Extreme Value Theory for efficient evaluation. Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and computationally feasible for large neural networks. Experimental results on various networks, including ResNet, Inception-v3 and MobileNet, show that (i) CLEVER is aligned with the robustness indication measured by the and norms of adversarial examples from powerful attacks, and (ii) defended networks using defensive distillation or bounded ReLU indeed achieve better CLEVER scores. To the best of our knowledge, CLEVER is the first attack-independent robustness metric that can be applied to any neural network classifier.

Paper Structure

This paper contains 22 sections, 6 theorems, 17 equations, 6 figures, 5 tables, 2 algorithms.

Key Result

Lemma 3.1

Let $S \subset \mathbb{R}^d$ be a convex bounded closed set and let $h(\bm{x}):S \rightarrow \mathbb{R}$ be a continuously differentiable function on an open set containing $S$. Then, $h(\bm{x})$ is a Lipschitz function with Lipschitz constant $L_q$ if the following inequality holds for any $\bm{x}, where $L_q = \max \{ \| \nabla h(\bm{x}) \|_q : \bm{x} \in S \}, \nabla h(\bm{x}) = (\frac{\partial

Figures (6)

  • Figure 1: Intuitions behind Theorem \ref{['thm:delta_bnd']}.
  • Figure 2: The cross Lipschitz constant samples for three images from CIFAR, MNIST and ImageNet datasets, and their fitted Reverse Weibull distributions with the corresponding MLE estimates of location, scale and shape parameters $(a_{\scaleto{W}{4pt}},b_{\scaleto{W}{4pt}},c_{\scaleto{W}{4pt}})$ shown on the top of each plot. The $D$-statistics of K-S test and p-values are denoted as ks and pval. With small ks and high p-value, the hypothesized reverse Weibull distribution fits the empirical distribution of cross Lipschitz constant samples well.
  • Figure 3: Comparison of the average targeted CLEVER scores with average $\ell_{\infty}$ and $\ell_{2}$ distortions found by CW, I-FSGM attacks, and the average scores calculated by using the algorithm in Lips:wood1996estimation (denoted as SLOPE) to estimate Lipschitz constant. DD and BReLU denote Defensive Distillation and Bounded ReLU defending methods applied to the CNN network. We did not include SLOPE in ImageNet networks because it has been shown to be ineffective even for smaller networks.
  • Figure 4: Percentage of images in ImageNet where the CLEVER score for that image is greater than the adversarial distortion found by different attacks.
  • Figure 8: Illustration of Theorem \ref{['thm:Fx_one_hidden']} with $d = 2, q = 2$ and $U = 3$. The three hyperplanes $\bm{w}_{i} \bm{x} + b_{i} = 0$ divide the space into seven regions (with different colors). The red dash line encloses the ball $B_2(\bm{x_0},R_1)$ and the blue dash line encloses a larger ball $B_2(\bm{x_0},R_2)$. If we draw samples uniformly within the balls, the probability of $\| \nabla g(\bm{x}) \|_2 = y$ is proportional to the intersected volumes of the ball and the regions with $\| \nabla g(\bm{x}) \|_2 = y$.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Definition 3.1: perturbed example and adversarial example
  • Definition 3.2: minimum adversarial distortion $\Delta_{p,\text{min}}$
  • Definition 3.3: lower bound of $\Delta_{p,\text{min}}$
  • Definition 3.4: upper bound of $\Delta_{p,\text{min}}$
  • Lemma 3.1: Lipschitz continuity and its relationship with gradient norm Lips:Lpnorm
  • Theorem 3.2: Formal guarantee on lower bound $\beta_L$ for untargeted attack
  • Remark 1
  • Corollary 3.2.1: Formal guarantee on $\beta_L$ for untargeted attack
  • Corollary 3.2.2: Formal guarantee on $\beta_L$ for targeted attack
  • Lemma 3.3: Formal guarantee on $\beta_L$ for ReLU networks
  • ...and 5 more