Table of Contents
Fetching ...

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, James Bailey

TL;DR

This work uses Local Intrinsic Dimensionality to quantify the local geometry of adversarial regions in deep networks, showing that adversarial perturbations increase local dimensionality and can be detected via LID-based features. By evaluating on MNIST, CIFAR-10, and SVHN against multiple attacks, the authors demonstrate that LID-based detectors outperform kernel-density and Bayesian uncertainty methods, even under adaptive attacks. The study highlights minibatch-based LID estimation as an efficient approach and suggests that LID captures fundamental dimensional properties of adversarial regions, guiding future defense and attack research. Overall, LID offers a principled, geometry-aware path toward robust adversarial detection in high-dimensional manifolds.

Abstract

Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called 'adversarial subspaces') in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets. Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

TL;DR

This work uses Local Intrinsic Dimensionality to quantify the local geometry of adversarial regions in deep networks, showing that adversarial perturbations increase local dimensionality and can be detected via LID-based features. By evaluating on MNIST, CIFAR-10, and SVHN against multiple attacks, the authors demonstrate that LID-based detectors outperform kernel-density and Bayesian uncertainty methods, even under adaptive attacks. The study highlights minibatch-based LID estimation as an efficient approach and suggests that LID captures fundamental dimensional properties of adversarial regions, guiding future defense and attack research. Overall, LID offers a principled, geometry-aware path toward robust adversarial detection in high-dimensional manifolds.

Abstract

Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called 'adversarial subspaces') in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets. Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.

Paper Structure

This paper contains 13 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: This example shows how density measures can fail to characterize the spatial properties of adversarial regions. The Gaussian kernel with bandwidth 0.2 is used for KD.
  • Figure 2: The left-hand figure shows the LID scores (at the softmax layer) of 100 normal (blue), noisy (green), and Opt attack (red x-cross) examples from the CIFAR-10 dataset. The scores have been scaled to the interval [0,1] using min-max normalization. The blue and green lines appear superimposed due to similarities in the LID scores for normal and noisy examples. The right-hand figure shows the detection performance (AUC) based on LID scores computed at different layers. $L_i$ denotes the $i$-th transformation layer.
  • Figure 3: Top row: tuning bandwidth $\sigma$ for KD using a grid search over the range $[0, 10)$ in log-space, separately for each dataset. Bottom row: tuning $k$ for LID using a grid search over the range $[10, 100)$ for minibatch size 100, separately for each dataset. The vertical dashed lines denote the selected parameter choice.
  • Figure 4: The plots show the normalized LID scores of 100 randomly selected normal (blue), noisy (green) and Opt attack (red x-cross) examples. The noisy and adversarial examples were generated from the normal examples. The left-hand plot shows the scores (at the pre-softmax layer) of MNIST examples, while the right-hand plot shows LID scores (at the softmax layer) of SVHN examples. Normal and noisy example curves appear superimposed in the right-hand figure due to the similarity of their values.
  • Figure 5: The detection AUC score of LID estimated using different neighborhood sizes $k$ with a larger minibatch size of 1000. The results are shown for the detection of Opt attacks on the MNIST, CIFAR-10 and SVHN datasets.

Theorems & Definitions (1)

  • Definition 1: Local Intrinsic Dimensionality