Table of Contents
Fetching ...

Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection

Peter Lorenz, Margret Keuper, Janis Keuper

TL;DR

This work tackles the vulnerability of CNNs to adversarial perturbations by introducing a lightweight white-box detector based on an unfolded Local Intrinsic Dimensionality (LID) representation, termed multiLID. By modeling per-neighbor growth-rate features and applying a non-linear classifier (Random Forest), the method achieves near-perfect discrimination between clean and adversarial images across common datasets and architectures. The paper provides extensive ablations on feature-layer choices, neighbor counts, and classifier types, showing that unfolded multiLID features substantially outperform original LID and other detectors. The approach offers a practical, scalable detector that can significantly enhance robustness in real-world deployment, with discussion of limitations and directions for transferability and broader validation.

Abstract

Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small "detector" is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID

Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection

TL;DR

This work tackles the vulnerability of CNNs to adversarial perturbations by introducing a lightweight white-box detector based on an unfolded Local Intrinsic Dimensionality (LID) representation, termed multiLID. By modeling per-neighbor growth-rate features and applying a non-linear classifier (Random Forest), the method achieves near-perfect discrimination between clean and adversarial images across common datasets and architectures. The paper provides extensive ablations on feature-layer choices, neighbor counts, and classifier types, showing that unfolded multiLID features substantially outperform original LID and other detectors. The approach offers a practical, scalable detector that can significantly enhance robustness in real-world deployment, with discussion of limitations and directions for transferability and broader validation.

Abstract

Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small "detector" is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID
Paper Structure (13 sections, 11 equations, 5 figures, 5 tables)

This paper contains 13 sections, 11 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Visualization of the LID features from the clean set of samples (black) and different adversarial attacks of 100 samples. The network is trained on and LID is evaluated on the feature map after the last ReLU activation.
  • Figure 2: Visualization of the LID features from the clean and fgsm set of 100 samples over each $k$. The network is trained on . The feature values for the nearest neighbors (low values on the x-axis) are significantly higher for the clean dataset. The LID log values are inversely proportional to the distance as shown in \ref{['eq:multilid']}. The plot on the right illustrates the mean and standard deviation of the two sets of profiles.
  • Figure 3: Feature importance. Increasing order according to the activation function layers (feature) from trained on . The most relevant features are in the last ReLU layers.
  • Figure 4: Cumulative features used for the lr classifier. The x-axis describes the length of the used feature vectors. The y-axis reports the AUC reached by using the most important features out of the full vector, sorted by rf feature importance.
  • Figure 5: Ablation study on CIFAR10 of LID and multiLID detection rates by using different $k$ on the APGD-CE ($L^2$, $L^{\infty}$) attack and different epsilon sizes.

Theorems & Definitions (1)

  • definition 1