Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

Nicolas Atienza; Christophe Labreuche; Johanne Cohen; Michele Sebag

Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

Nicolas Atienza, Christophe Labreuche, Johanne Cohen, Michele Sebag

TL;DR

SPADE addresses the challenge of reliably identifying out-of-distribution and adversarial inputs by turning a trained classifier into an abstaining detector driven by Extreme Value Theory. It models class-specific latent-space distance tails with Generalized Extreme Value (GEV) distributions and uses a Peak Over Threshold (POT) fit to estimate these tails, enabling a probabilistic OOD score and an abstaining decision rule with provable guarantees against adversarial perturbations under mild assumptions. The method is validated across ResNet, ViT, and VGG architectures on CIFAR-10/100 and ImageNet-1K, showing competitive OOD performance and strong adversarial detection while maintaining practical efficiency. Overall, SPADE provides a principled, data-efficient pathway to robust abstention in neural classifiers, with clear avenues for integration into safety-critical systems and further theoretical and empirical refinements.

Abstract

This paper introduces a novel method, Sample-efficient Probabilistic Detection using Extreme Value Theory (SPADE), which transforms a classifier into an abstaining classifier, offering provable protection against out-of-distribution and adversarial samples. The approach is based on a Generalized Extreme Value (GEV) model of the training distribution in the classifier's latent space, enabling the formal characterization of OOD samples. Interestingly, under mild assumptions, the GEV model also allows for formally characterizing adversarial samples. The abstaining classifier, which rejects samples based on their assessment by the GEV model, provably avoids OOD and adversarial samples. The empirical validation of the approach, conducted on various neural architectures (ResNet, VGG, and Vision Transformer) and medium and large-sized datasets (CIFAR-10, CIFAR-100, and ImageNet), demonstrates its frugality, stability, and efficiency compared to the state of the art.

Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

TL;DR

Abstract

Paper Structure (22 sections, 1 theorem, 12 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 1 theorem, 12 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Notations.
Formal Background
Properties of Latent In-Distribution
Extreme Value Theory
SPADE Overview
EVT-based Characterization of OOD
Abstaining Classifier on OOD Samples
Abstaining classifier with provable guarantees w.r.t. adversarial examples
Estimating the GEV Models
Discussion
Experimental Setting
Goals.
Metrics.
Benchmarks.
...and 7 more sections

Key Result

Theorem 1

Let us assume that the latent embedding $h$ is $K$-Lipschitz. Let $\mathbf{x}$ be an adversarial sample built by perturbation of a training sample $\mathbf{x}^*$ of class $c$, with perturbation amplitude $\varepsilon$ ($\|\mathbf{x} - \mathbf{x}^*\| < \varepsilon$), and let $f(\mathbf{x}) = c' \neq

Figures (2)

Figure 1: Stability of EVT parameter estimation wrt sampling ratio and estimation variance for one class of CIFAR-100 on ResNet-18.
Figure 2: Sensitivity analysis of the OOD detection on CIFAR-100 w.r.t. the subsampling rate of the training set: AUC (dashed line) and FPR95 (solid line) performances for SPADE (in blue) and KNN pmlr-v162-sun22d (in orange; better seen in color).

Theorems & Definitions (7)

Definition 1: ye2021theoretical
Definition 2: ye2021theoretical
Definition 3: Extreme Value Distribution (EVD) Fisher_Tippett_1928
Definition 4: OOD test
Definition 5: Abstaining classifier
Theorem 1
proof

Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

TL;DR

Abstract

Provably Safeguarding a Classifier from OOD and Adversarial Samples: an Extreme Value Theory Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (7)