Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability
Rajdeep Haldar, Yue Xing, Qifan Song
TL;DR
The paper shows that a dimension gap between intrinsic data structure and ambient representation yields off-manifold adversarial vulnerability even for cleanly trained 2-layer ReLU networks. By formalizing a data model with isometric immersions and a two-layer network, it derives explicit bounds on the strengths of on- and off-manifold attacks in both $ ext{L}_2$ and $ ext{L}_{ ext{inf}}$ norms, revealing that off-manifold perturbations can become arbitrarily small as the gap grows while on-manifold perturbations scale differently. The results are supported by simulations and experiments on MNIST, Fashion-MNIST, and ImageNet binary tasks, confirming that increasing ambient dimension degrades robustness in line with the theory. The work clarifies the dimension-gap as a fundamental factor behind adversarial vulnerability and discusses implications for adversarial training as a mitigation strategy, highlighting that robustness may require addressing the ambient- intrinsic dimensional separation. Overall, the study links geometric data properties to adversarial susceptibility, offering a principled lens for understanding and improving robustness in high-dimensional learning systems.
Abstract
The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.
