A practical approach to evaluating the adversarial distance for machine learning classifiers

Georg Siedel; Ekagra Gupta; Andrey Morozov

A practical approach to evaluating the adversarial distance for machine learning classifiers

Georg Siedel, Ekagra Gupta, Andrey Morozov

TL;DR

This paper investigates the estimation of the more informative adversarial distance using iterative adversarial attacks and a certification approach, and finds that the adversarial attack approach is effective compared to related implementations, while the certification method falls short of expectations.

Abstract

Robustness is critical for machine learning (ML) classifiers to ensure consistent performance in real-world applications where models may encounter corrupted or adversarial inputs. In particular, assessing the robustness of classifiers to adversarial inputs is essential to protect systems from vulnerabilities and thus ensure safety in use. However, methods to accurately compute adversarial robustness have been challenging for complex ML models and high-dimensional data. Furthermore, evaluations typically measure adversarial accuracy on specific attack budgets, limiting the informative value of the resulting metrics. This paper investigates the estimation of the more informative adversarial distance using iterative adversarial attacks and a certification approach. Combined, the methods provide a comprehensive evaluation of adversarial robustness by computing estimates for the upper and lower bounds of the adversarial distance. We present visualisations and ablation studies that provide insights into how this evaluation method should be applied and parameterised. We find that our adversarial attack approach is effective compared to related implementations, while the certification method falls short of expectations. The approach in this paper should encourage a more informative way of evaluating the adversarial robustness of ML classifiers.

A practical approach to evaluating the adversarial distance for machine learning classifiers

TL;DR

Abstract

Paper Structure (18 sections, 3 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 3 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
Preliminaries
Related Work
Adversarial Distance Calculation through Attacks
Robustness Certification
Clever Score Metric
Robustness Toolbox
Proposed Algorithm
Experiments
Results
Parameterization of Algorithm 1 with PGD
Parameterization of Clever Score
Adversarial Distances in $L_\infty$
Adversarial Distances in $L_2$
Adversarial Distances in $L_1$
...and 3 more sections

Figures (6)

Figure 1: A one-step attack and an iterative attack on the model with the red decision boundary visualize the difference between (a) finding some adversarial perturbation $x_{adversarial}$ using the full attack budget epsilon and (b) finding a good estimate for the minimal adversarial distance between $x_{adversarial}$ and $x$. In case (a), only a discrete 0-or-1 success/failure rate (adversarial risk/accuracy) of the attack can be reported, while (b) allows for a continuous robustness assessment.
Figure 2: Working principle of CLEVER score estimation
Figure 3: $L_{\infty}$ Distance of 50 images for different $\epsilon_{step}$ values (standard model)
Figure 4: Trade-off between tightness of the mean adversarial distance estimation and the computational effort for varying $\epsilon_{step}$
Figure 5: Comparison of all attacks, standard model, 20 images. Above: Mean Adversarial Distance and total runtime. Below: Image-wise.
...and 1 more figures

A practical approach to evaluating the adversarial distance for machine learning classifiers

TL;DR

Abstract

A practical approach to evaluating the adversarial distance for machine learning classifiers

Authors

TL;DR

Abstract

Table of Contents

Figures (6)