Table of Contents
Fetching ...

Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers

Jonas Ngnawé, Sabyasachi Sahoo, Yann Pequignot, Frédéric Precioso, Christian Gagné

TL;DR

This paper establishes that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples, and indicates high margin consistency with a strong correlation between their input space margins and the logit margins.

Abstract

Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning models can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples. Next, through comprehensive empirical analysis of various robustly trained models on CIFAR10 and CIFAR100 datasets, we show that they indicate high margin consistency with a strong correlation between their input space margins and the logit margins. Then, we show that we can effectively and confidently use the logit margin to detect brittle decisions with such models. Finally, we address cases where the model is not sufficiently margin-consistent by learning a pseudo-margin from the feature representation. Our findings highlight the potential of leveraging deep representations to assess adversarial vulnerability in deployment scenarios efficiently.

Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers

TL;DR

This paper establishes that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples, and indicates high margin consistency with a strong correlation between their input space margins and the logit margins.

Abstract

Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning models can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples. Next, through comprehensive empirical analysis of various robustly trained models on CIFAR10 and CIFAR100 datasets, we show that they indicate high margin consistency with a strong correlation between their input space margins and the logit margins. Then, we show that we can effectively and confidently use the logit margin to detect brittle decisions with such models. Finally, we address cases where the model is not sufficiently margin-consistent by learning a pseudo-margin from the feature representation. Our findings highlight the potential of leveraging deep representations to assess adversarial vulnerability in deployment scenarios efficiently.

Paper Structure

This paper contains 21 sections, 1 theorem, 7 equations, 14 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

If a model is margin-consistent, then for any robustness threshold $\epsilon$, there exists a threshold $\lambda$ for the logit margin $d_{out}$ that separates perfectly non-robust samples and robust samples. Conversely, if for any robustness threshold $\epsilon$, $d_{out}$ admits a threshold $\lamb

Figures (14)

  • Figure 1: Illustration of the input space margin, margin in the feature space and margin consistency. The model preserves the relative position of samples to the decision boundary in the input space to the feature space.
  • Figure 2: Illustration of Theorem \ref{['thm']}'s proof.
  • Figure 3: Margin consistency of various models: there is a strong correlation between input space margin and logit margin for most $\ell_{\infty}$ robust models tested, the exceptions being DI0 and XU80 on CIFAR10. See Table \ref{['tab:aucplus']} for the references on the models. The correlations are given with standard error for the y-axis values in each interval.
  • Figure 4: Distribution of the correlation between input margins and logit margins in $\ell_{\infty}$ with robust accuracy. The strength of the correlation, which indicates the level of margin consistency, does not depend on the robust accuracy. References on models are given in Table \ref{['tab:aucplus']}.
  • Figure 5: The correlations between the input margin, the distance between the feature representations of samples and their closest adversaries (feature distance -- $\|h_\psi(x)-h_\psi(x')\|$), and the logit margin may be due to the local isometry of the feature extractor. See Table \ref{['tab:aucplus']} for the specific references on the model ID. The correlations are given with standard error for the y-axis values in each interval.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof