A Geometric Framework for Adversarial Vulnerability in Machine Learning

Brian Bell

A Geometric Framework for Adversarial Vulnerability in Machine Learning

Brian Bell

TL;DR

The work addresses adversarial vulnerability in neural networks by proposing a geometric framework that blends high-dimensional curvature, persistence, and decision-boundary analysis to formalize robustness questions, including the Dimpled Manifold Hypothesis. It develops exact path kernel representations and their generalized form (EPK and gEPK) to decompose predictions along training paths and across data, enabling finite-width networks to be analyzed with kernel methods. Novel metrics such as $(oldsymbol{\gamma},oldsymbol{\sigma})$-stability and $oldsymbol{\gamma}$-persistence quantify local robustness, and experiments on MNIST and ImageNet reveal that adversarial examples exhibit reduced persistence and distinctive decision-boundary angles; the framework also links to Out-of-Distribution detection and signal-manifold dimension. Overall, the approach provides a rigorous mathematical foundation to study robustness, generalization in Reproducing Kernel Banach Spaces, and connections between distributional learning and neural networks, with practical implications for robustness and uncertainty quantification.

Abstract

This work starts with the intention of using mathematics to understand the intriguing vulnerability observed by ~\citet{szegedy2013} within artificial neural networks. Along the way, we will develop some novel tools with applications far outside of just the adversarial domain. We will do this while developing a rigorous mathematical framework to examine this problem. Our goal is to build out theory which can support increasingly sophisticated conjecture about adversarial attacks with a particular focus on the so called ``Dimpled Manifold Hypothesis'' by ~\citet{shamir2021dimpled}. Chapter one will cover the history and architecture of neural network architectures. Chapter two is focused on the background of adversarial vulnerability. Starting from the seminal paper by ~\citet{szegedy2013} we will develop the theory of adversarial perturbation and attack. Chapter three will build a theory of persistence that is related to Ricci Curvature, which can be used to measure properties of decision boundaries. We will use this foundation to make a conjecture relating adversarial attacks. Chapters four and five represent a sudden and wonderful digression that examines an intriguing related body of theory for spatial analysis of neural networks as approximations of kernel machines and becomes a novel theory for representing neural networks with bilinear maps. These heavily mathematical chapters will set up a framework and begin exploring applications of what may become a very important theoretical foundation for analyzing neural network learning with spatial and geometric information. We will conclude by setting up our new methods to address the conjecture from chapter 3 in continuing research.

A Geometric Framework for Adversarial Vulnerability in Machine Learning

TL;DR

-stability and

-persistence quantify local robustness, and experiments on MNIST and ImageNet reveal that adversarial examples exhibit reduced persistence and distinctive decision-boundary angles; the framework also links to Out-of-Distribution detection and signal-manifold dimension. Overall, the approach provides a rigorous mathematical foundation to study robustness, generalization in Reproducing Kernel Banach Spaces, and connections between distributional learning and neural networks, with practical implications for robustness and uncertainty quantification.

Abstract

Paper Structure (88 sections, 7 theorems, 82 equations, 37 figures, 3 tables, 2 algorithms)

This paper contains 88 sections, 7 theorems, 82 equations, 37 figures, 3 tables, 2 algorithms.

Introduction
Background
Artificial Neural Networks (ANNs)
Structure
Convolutional Neural Networks (CNNs)
Training ANNs
Selection of the Training Set
Selecting a Loss Function
Computation of Gradient via Backpropagation
Optimization of Weights
Adversarial Attacks
Common Datasets
Common Attack Techniques
L-BFGS Minimizing Distortion
L-BFGS: Mnist
...and 73 more sections

Key Result

Lemma 4.3.4

The exact path kernel (EPK) is a kernel.

Figures (37)

Figure 1: Natural Images are in columns 1 and 4, Adversarial images are in columns 3 and 6, and the difference between them (magnified by a factor of 10) is in columns 2 and 5. All images in columns 3 and 6 are classified by AlexNet as "Ostrich" szegedy2013.
Figure 2: Original images on the left, Perturbation is in the middle, Adversarial Image (total of Original with Perturbation) is on the right. Column 1 shows an original 8 being perturbed to adversarial classes 0, 2, and 4. Column 2 shows adversarial classes 1, 3, and 5
Figure 3: A histogram of the distortion measured for each of 900 adversarial examples generated using L-BFGS against the FC-200-200-10 network on Mnist. Mean distortion is 0.089.
Figure 4: Original images on the left, Perturbation (magnified by a factor of 100) by is in the middle, Adversarial Image (total of Original with Perturbation) is on the right.
Figure 5: A histogram of the distortion measured for each of 112 adversarial examples generated using L-BFGS against the VGG16 network on ImageNet images with mean distortion 0.0107
...and 32 more figures

Theorems & Definitions (28)

Definition 1.1.1
Definition 1.1.2
Definition 1.1.3
Definition 1.1.4
Definition 1.1.5
Definition 1.1.6
Definition 1.1.7
Definition 2.2.1
Definition 2.3.1
Definition 2.3.2
...and 18 more

A Geometric Framework for Adversarial Vulnerability in Machine Learning

TL;DR

Abstract

A Geometric Framework for Adversarial Vulnerability in Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (37)

Theorems & Definitions (28)