Table of Contents
Fetching ...

Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

Nicolas Atienza, Christophe Labreuche, Johanne Cohen, Michele Sebag

TL;DR

SPADE addresses the challenge of reliably identifying out-of-distribution and adversarial inputs by turning a trained classifier into an abstaining detector driven by Extreme Value Theory. It models class-specific latent-space distance tails with Generalized Extreme Value (GEV) distributions and uses a Peak Over Threshold (POT) fit to estimate these tails, enabling a probabilistic OOD score and an abstaining decision rule with provable guarantees against adversarial perturbations under mild assumptions. The method is validated across ResNet, ViT, and VGG architectures on CIFAR-10/100 and ImageNet-1K, showing competitive OOD performance and strong adversarial detection while maintaining practical efficiency. Overall, SPADE provides a principled, data-efficient pathway to robust abstention in neural classifiers, with clear avenues for integration into safety-critical systems and further theoretical and empirical refinements.

Abstract

This paper introduces a novel method, Sample-efficient Probabilistic Detection using Extreme Value Theory (SPADE), which transforms a classifier into an abstaining classifier, offering provable protection against out-of-distribution and adversarial samples. The approach is based on a Generalized Extreme Value (GEV) model of the training distribution in the classifier's latent space, enabling the formal characterization of OOD samples. Interestingly, under mild assumptions, the GEV model also allows for formally characterizing adversarial samples. The abstaining classifier, which rejects samples based on their assessment by the GEV model, provably avoids OOD and adversarial samples. The empirical validation of the approach, conducted on various neural architectures (ResNet, VGG, and Vision Transformer) and medium and large-sized datasets (CIFAR-10, CIFAR-100, and ImageNet), demonstrates its frugality, stability, and efficiency compared to the state of the art.

Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

TL;DR

SPADE addresses the challenge of reliably identifying out-of-distribution and adversarial inputs by turning a trained classifier into an abstaining detector driven by Extreme Value Theory. It models class-specific latent-space distance tails with Generalized Extreme Value (GEV) distributions and uses a Peak Over Threshold (POT) fit to estimate these tails, enabling a probabilistic OOD score and an abstaining decision rule with provable guarantees against adversarial perturbations under mild assumptions. The method is validated across ResNet, ViT, and VGG architectures on CIFAR-10/100 and ImageNet-1K, showing competitive OOD performance and strong adversarial detection while maintaining practical efficiency. Overall, SPADE provides a principled, data-efficient pathway to robust abstention in neural classifiers, with clear avenues for integration into safety-critical systems and further theoretical and empirical refinements.

Abstract

This paper introduces a novel method, Sample-efficient Probabilistic Detection using Extreme Value Theory (SPADE), which transforms a classifier into an abstaining classifier, offering provable protection against out-of-distribution and adversarial samples. The approach is based on a Generalized Extreme Value (GEV) model of the training distribution in the classifier's latent space, enabling the formal characterization of OOD samples. Interestingly, under mild assumptions, the GEV model also allows for formally characterizing adversarial samples. The abstaining classifier, which rejects samples based on their assessment by the GEV model, provably avoids OOD and adversarial samples. The empirical validation of the approach, conducted on various neural architectures (ResNet, VGG, and Vision Transformer) and medium and large-sized datasets (CIFAR-10, CIFAR-100, and ImageNet), demonstrates its frugality, stability, and efficiency compared to the state of the art.
Paper Structure (22 sections, 1 theorem, 12 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 1 theorem, 12 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let us assume that the latent embedding $h$ is $K$-Lipschitz. Let $\mathbf{x}$ be an adversarial sample built by perturbation of a training sample $\mathbf{x}^*$ of class $c$, with perturbation amplitude $\varepsilon$ ($\|\mathbf{x} - \mathbf{x}^*\| < \varepsilon$), and let $f(\mathbf{x}) = c' \neq

Figures (2)

  • Figure 1: Stability of EVT parameter estimation wrt sampling ratio and estimation variance for one class of CIFAR-100 on ResNet-18.
  • Figure 2: Sensitivity analysis of the OOD detection on CIFAR-100 w.r.t. the subsampling rate of the training set: AUC (dashed line) and FPR95 (solid line) performances for SPADE (in blue) and KNN pmlr-v162-sun22d (in orange; better seen in color).

Theorems & Definitions (7)

  • Definition 1: ye2021theoretical
  • Definition 2: ye2021theoretical
  • Definition 3: Extreme Value Distribution (EVD) Fisher_Tippett_1928
  • Definition 4: OOD test
  • Definition 5: Abstaining classifier
  • Theorem 1
  • proof