Table of Contents
Fetching ...

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

Rajdeep Haldar, Yue Xing, Qifan Song

TL;DR

The paper shows that a dimension gap between intrinsic data structure and ambient representation yields off-manifold adversarial vulnerability even for cleanly trained 2-layer ReLU networks. By formalizing a data model with isometric immersions and a two-layer network, it derives explicit bounds on the strengths of on- and off-manifold attacks in both $ ext{L}_2$ and $ ext{L}_{ ext{inf}}$ norms, revealing that off-manifold perturbations can become arbitrarily small as the gap grows while on-manifold perturbations scale differently. The results are supported by simulations and experiments on MNIST, Fashion-MNIST, and ImageNet binary tasks, confirming that increasing ambient dimension degrades robustness in line with the theory. The work clarifies the dimension-gap as a fundamental factor behind adversarial vulnerability and discusses implications for adversarial training as a mitigation strategy, highlighting that robustness may require addressing the ambient- intrinsic dimensional separation. Overall, the study links geometric data properties to adversarial susceptibility, offering a principled lens for understanding and improving robustness in high-dimensional learning systems.

Abstract

The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

TL;DR

The paper shows that a dimension gap between intrinsic data structure and ambient representation yields off-manifold adversarial vulnerability even for cleanly trained 2-layer ReLU networks. By formalizing a data model with isometric immersions and a two-layer network, it derives explicit bounds on the strengths of on- and off-manifold attacks in both and norms, revealing that off-manifold perturbations can become arbitrarily small as the gap grows while on-manifold perturbations scale differently. The results are supported by simulations and experiments on MNIST, Fashion-MNIST, and ImageNet binary tasks, confirming that increasing ambient dimension degrades robustness in line with the theory. The work clarifies the dimension-gap as a fundamental factor behind adversarial vulnerability and discusses implications for adversarial training as a mitigation strategy, highlighting that robustness may require addressing the ambient- intrinsic dimensional separation. Overall, the study links geometric data properties to adversarial susceptibility, offering a principled lens for understanding and improving robustness in high-dimensional learning systems.

Abstract

The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the attack strength of the on/off-manifold attack and the dimension gap.
Paper Structure (38 sections, 21 theorems, 85 equations, 15 figures, 6 tables)

This paper contains 38 sections, 21 theorems, 85 equations, 15 figures, 6 tables.

Key Result

Theorem 1.1

As the dimension gap increases, the strength of $\ell_2,\ell_{\infty}$ attack required to misclassify the sample decreases, i.e., the model is more vulnerable. Notably, for $\ell_{\infty}$ attack, the strength asymptotically goes to $0$ w.r.t. the dimension gap.

Figures (15)

  • Figure 1: Mental image: The oracle decision boundary (green dashed line) determines the label (blue or red) of any point in the Euclidean space. The observed data space consists of 1-dimensional line segments immersed in the 2-dimensional space. The model learns the estimated decision boundary (black dotted line) based on the observed data.
  • Figure 2: Original image is a digit 5. (a) Natural attack; (b) Unnatural attack (minor changes in pixel values of the background compared to the original). Under both of these attacks, the neural network classifies the image as 3 instead of 5.
  • Figure 3: Attack strength threshold associating with $10\%$ robust test accuracy (left) and the on-manifold proportion in $\mathbb{R}^D$ attack. The top and bottom rows correspond to the cases $\|\boldsymbol{\zeta}\|=\Theta(1),\Theta(\sqrt{d})$ respectively.
  • Figure 4: (a) Original image 28x28, $D=784$; (b) Padded image 34x34, $P=3$, $D=1156$.
  • Figure 5: The relationship between attack strength and the padding number $P$ in Table \ref{['tab:my_label']} (Top) and the relationship between robust test accuracy and attack strength for different values of $P$ (Bottom). $\ell_2$ attack. 10 training epochs.
  • ...and 10 more figures

Theorems & Definitions (39)

  • Theorem 1.1: Informal version of Theorem \ref{['thm:pert flips']}
  • Theorem 3.1: lyu2020gradient
  • Definition 4.1: Nice example
  • Theorem 4.1
  • Theorem 4.2
  • Definition B.1: Semi-inner product
  • Definition B.2: Constants
  • Lemma C.0.1: Properties
  • proof
  • Lemma C.1.1
  • ...and 29 more