Table of Contents
Fetching ...

Characterizing Data Point Vulnerability via Average-Case Robustness

Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

TL;DR

The paper introduces average-case robustness, defined as $p^\text{robust}_\sigma(\mathbf{x}) = P_{\epsilon \sim \mathcal{N}(0,\sigma^2)}[ \arg\max_i f_i(\mathbf{x}+\epsilon) = t ]$, as a complementary lens to adversarial robustness for understanding data-point vulnerability. It develops analytical estimators for multi-class classifiers, including the Taylor estimator $p^\text{taylor}_\sigma$ and the MMSE estimator $p^\text{mmse}_\sigma$, both using a multivariate Gaussian CDF with covariance $\mathbf{U}\mathbf{U}^\top$, plus fast mv-sigmoid and softmax-based approximations to improve efficiency and differentiability. The authors provide finite-sample error bounds, demonstrate fast convergence (often with small $N$) and substantial computational gains over Monte Carlo, and validate the approach on MNIST, FMNIST, CIFAR-10/100 with various architectures, showing accurate estimates of local robustness and the ability to identify vulnerable samples and class-robustness bias. They also show that softmax probabilities are a poor proxy for average-case robustness in general, while probust correlates with canonical versus boundary data and aids dataset debugging. Collectively, the work offers practical, scalable tools for diagnosing and understanding data-point vulnerability and model robustness beyond worst-case guarantees.

Abstract

Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings. To this end, adversarial robustness is a standard framework, which views robustness of predictions through a binary lens: either a worst-case adversarial misclassification exists in the local region around an input, or it does not. However, this binary perspective does not account for the degrees of vulnerability, as data points with a larger number of misclassified examples in their neighborhoods are more vulnerable. In this work, we consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region that provides consistent predictions. However, computing this quantity is hard, as standard Monte Carlo approaches are inefficient especially for high-dimensional inputs. In this work, we propose the first analytical estimators for average-case robustness for multi-class classifiers. We show empirically that our estimators are accurate and efficient for standard deep learning models and demonstrate their usefulness for identifying vulnerable data points, as well as quantifying robustness bias of models. Overall, our tools provide a complementary view to robustness, improving our ability to characterize model behaviour.

Characterizing Data Point Vulnerability via Average-Case Robustness

TL;DR

The paper introduces average-case robustness, defined as , as a complementary lens to adversarial robustness for understanding data-point vulnerability. It develops analytical estimators for multi-class classifiers, including the Taylor estimator and the MMSE estimator , both using a multivariate Gaussian CDF with covariance , plus fast mv-sigmoid and softmax-based approximations to improve efficiency and differentiability. The authors provide finite-sample error bounds, demonstrate fast convergence (often with small ) and substantial computational gains over Monte Carlo, and validate the approach on MNIST, FMNIST, CIFAR-10/100 with various architectures, showing accurate estimates of local robustness and the ability to identify vulnerable samples and class-robustness bias. They also show that softmax probabilities are a poor proxy for average-case robustness in general, while probust correlates with canonical versus boundary data and aids dataset debugging. Collectively, the work offers practical, scalable tools for diagnosing and understanding data-point vulnerability and model robustness beyond worst-case guarantees.

Abstract

Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings. To this end, adversarial robustness is a standard framework, which views robustness of predictions through a binary lens: either a worst-case adversarial misclassification exists in the local region around an input, or it does not. However, this binary perspective does not account for the degrees of vulnerability, as data points with a larger number of misclassified examples in their neighborhoods are more vulnerable. In this work, we consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region that provides consistent predictions. However, computing this quantity is hard, as standard Monte Carlo approaches are inefficient especially for high-dimensional inputs. In this work, we propose the first analytical estimators for average-case robustness for multi-class classifiers. We show empirically that our estimators are accurate and efficient for standard deep learning models and demonstrate their usefulness for identifying vulnerable data points, as well as quantifying robustness bias of models. Overall, our tools provide a complementary view to robustness, improving our ability to characterize model behaviour.
Paper Structure (26 sections, 12 theorems, 32 equations, 19 figures, 3 tables)

This paper contains 26 sections, 12 theorems, 32 equations, 19 figures, 3 tables.

Key Result

Lemma 3.1

The local robustness of a multi-class linear model $f(\mathbf{x}) = \mathbf{w}^\top \mathbf{x} + b$ (with $\mathbf{w} \in \mathbb{R}^{d \times C}$ and $b \in \mathbb{R}^C$) at point $\mathbf{x}$ with respect to a target class $t$ is given by the following. Define weights $\mathbf{u}_i = \mathbf{w}_t and $\Phi_{\mathbf{U} \mathbf{U}^\top}$ is the ($C-1$)-dimensional Normal CDF with zero mean and co

Figures (19)

  • Figure 1: Consider a binary classifier (green vs. yellow) and points $A$ (left) and $B$ (right), both correctly classified to the yellow class. The dotted red circles represent $\epsilon$-balls around the data points. Although adversarial robustness rightly considers the model non-robust at both points (due to the existence of adversarial examples within the $\epsilon$-ball), it fails to discern that point $B$ has a larger fraction of misclassified points in its neighborhood, making it more vulnerable than point $A$, an aspect exactly captured by average-case robustness.
  • Figure 2: Empirical evaluation of analytical estimators. (a) The smaller the noise neighborhood $\sigma$, the more accurately the estimators compute probust. pmmse and pmmsemvs are the best estimators of probust, followed closely by ptaylormvs and ptaylor, trailed by psoftmax. (b) For more robust models, the estimators compute probust more accurately over a larger $\sigma$. Together, these results indicate that the analytical estimators accurately compute probust.
  • Figure 3: Example ranking of probust among CIFAR10 classes. Images with high probust are farther away from the decision boundary, and tend to be brighter and have stronger object-background contrast than those with low probust, which are closer to the decision boundary, and thus easily misclassified.
  • Figure 4: Computing robustness bias among classes for the (a) ResNet18 CIFAR10 model, and (b) for the CNN FMNIST model. probust reveals that the model robustness varies significantly across classes, revealing a marked class-wise bias within standard models. The analytical estimator pmmse accurately captures this model bias.
  • Figure 5: Convergence of pmc. In practice, pmc takes around $n=10,000$ samples to converge and is computationally inefficient.
  • ...and 14 more figures

Theorems & Definitions (23)

  • Definition 1
  • Lemma 3.1
  • Lemma 3.2
  • Definition 2
  • Proposition 3.1
  • Definition 3
  • Proposition 3.2
  • Lemma A.1
  • proof
  • Lemma A.2
  • ...and 13 more