Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

Rajdeep Haldar; Yue Xing; Qifan Song

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

Rajdeep Haldar, Yue Xing, Qifan Song

TL;DR

The paper shows that a dimension gap between intrinsic data structure and ambient representation yields off-manifold adversarial vulnerability even for cleanly trained 2-layer ReLU networks. By formalizing a data model with isometric immersions and a two-layer network, it derives explicit bounds on the strengths of on- and off-manifold attacks in both $ ext{L}_2$ and $ ext{L}_{ ext{inf}}$ norms, revealing that off-manifold perturbations can become arbitrarily small as the gap grows while on-manifold perturbations scale differently. The results are supported by simulations and experiments on MNIST, Fashion-MNIST, and ImageNet binary tasks, confirming that increasing ambient dimension degrades robustness in line with the theory. The work clarifies the dimension-gap as a fundamental factor behind adversarial vulnerability and discusses implications for adversarial training as a mitigation strategy, highlighting that robustness may require addressing the ambient- intrinsic dimensional separation. Overall, the study links geometric data properties to adversarial susceptibility, offering a principled lens for understanding and improving robustness in high-dimensional learning systems.

Abstract

The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

TL;DR

and

norms, revealing that off-manifold perturbations can become arbitrarily small as the gap grows while on-manifold perturbations scale differently. The results are supported by simulations and experiments on MNIST, Fashion-MNIST, and ImageNet binary tasks, confirming that increasing ambient dimension degrades robustness in line with the theory. The work clarifies the dimension-gap as a fundamental factor behind adversarial vulnerability and discusses implications for adversarial training as a mitigation strategy, highlighting that robustness may require addressing the ambient- intrinsic dimensional separation. Overall, the study links geometric data properties to adversarial susceptibility, offering a principled lens for understanding and improving robustness in high-dimensional learning systems.

Abstract

attack strength of the on/off-manifold attack and the dimension gap.

Paper Structure (38 sections, 21 theorems, 85 equations, 15 figures, 6 tables)

This paper contains 38 sections, 21 theorems, 85 equations, 15 figures, 6 tables.

Introduction
Related Works
Adversarial study motivated by a manifold view
Adversarial examples in random ReLU networks
Adversarial examples due to implicit bias in two-layer Relu networks
Adversarial examples due to low dimensional data
Problem Setup
Data
Data Interpretation
Neural Network
Expression for Network Parameters
Volatile biases
Comparison to melamed2023adversarial
Assumptions
Main Results
...and 23 more sections

Key Result

Theorem 1.1

As the dimension gap increases, the strength of $\ell_2,\ell_{\infty}$ attack required to misclassify the sample decreases, i.e., the model is more vulnerable. Notably, for $\ell_{\infty}$ attack, the strength asymptotically goes to $0$ w.r.t. the dimension gap.

Figures (15)

Figure 1: Mental image: The oracle decision boundary (green dashed line) determines the label (blue or red) of any point in the Euclidean space. The observed data space consists of 1-dimensional line segments immersed in the 2-dimensional space. The model learns the estimated decision boundary (black dotted line) based on the observed data.
Figure 2: Original image is a digit 5. (a) Natural attack; (b) Unnatural attack (minor changes in pixel values of the background compared to the original). Under both of these attacks, the neural network classifies the image as 3 instead of 5.
Figure 3: Attack strength threshold associating with $10\%$ robust test accuracy (left) and the on-manifold proportion in $\mathbb{R}^D$ attack. The top and bottom rows correspond to the cases $\|\boldsymbol{\zeta}\|=\Theta(1),\Theta(\sqrt{d})$ respectively.
Figure 4: (a) Original image 28x28, $D=784$; (b) Padded image 34x34, $P=3$, $D=1156$.
Figure 5: The relationship between attack strength and the padding number $P$ in Table \ref{['tab:my_label']} (Top) and the relationship between robust test accuracy and attack strength for different values of $P$ (Bottom). $\ell_2$ attack. 10 training epochs.
...and 10 more figures

Theorems & Definitions (39)

Theorem 1.1: Informal version of Theorem \ref{['thm:pert flips']}
Theorem 3.1: lyu2020gradient
Definition 4.1: Nice example
Theorem 4.1
Theorem 4.2
Definition B.1: Semi-inner product
Definition B.2: Constants
Lemma C.0.1: Properties
proof
Lemma C.1.1
...and 29 more

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

TL;DR

Abstract

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (39)