Table of Contents
Fetching ...

Robustness bounds on the successful adversarial examples in probabilistic models: Implications from Gaussian processes

Hiroaki Maeshima, Akira Otsuka

TL;DR

This work establishes a theoretical robustness bound for adversarial examples (AEs) within Gaussian Process (GP) classification, showing that the maximum probability of a successful AE is tightly bounded by a function of the perturbation norm $r$, the kernel $k(\cdot,\cdot)$, and the kernel-based distance to the closest opposite-label point. The authors derive the maximum success probability (MSP) bound $\Pr(\mathcal{C}(x_*)=-1) < \Phi(-\mu/\sigma) < \frac{1}{2}\exp(-\mu^2/(2\sigma^2)) = \phi(r|\mathcal{D})$, with $\mu$ and $\sigma^2$ defined via kernel interactions, and show this bound holds under an $\epsilon$-proximity assumption relating full-data and two-point training configurations. They provide a structured proof outline (predictive variance behavior, monotonicity of Gaussian tails, and cross-label distance effects) and extend the results to the case of all training data pairs. Empirically, the theory is validated on ImageNet using a Gaussian-kernel GP regressor across varying kernel parameters, demonstrating that the theoretical MSP bound tracks observed AE success probabilities and that kernel choices meaningfully affect robustness. The results offer a kernel-centered framework for robustness improvement, with implications for activation-function design in neural networks via GP-analogies and for future multi-class extensions and efficient nearest-neighbor computations.

Abstract

Adversarial example (AE) is an attack method for machine learning, which is crafted by adding imperceptible perturbation to the data inducing misclassification. In the current paper, we investigated the upper bound of the probability of successful AEs based on the Gaussian Process (GP) classification, a probabilistic inference model. We proved a new upper bound of the probability of a successful AE attack that depends on AE's perturbation norm, the kernel function used in GP, and the distance of the closest pair with different labels in the training dataset. Surprisingly, the upper bound is determined regardless of the distribution of the sample dataset. We showed that our theoretical result was confirmed through the experiment using ImageNet. In addition, we showed that changing the parameters of the kernel function induces a change of the upper bound of the probability of successful AEs.

Robustness bounds on the successful adversarial examples in probabilistic models: Implications from Gaussian processes

TL;DR

This work establishes a theoretical robustness bound for adversarial examples (AEs) within Gaussian Process (GP) classification, showing that the maximum probability of a successful AE is tightly bounded by a function of the perturbation norm , the kernel , and the kernel-based distance to the closest opposite-label point. The authors derive the maximum success probability (MSP) bound , with and defined via kernel interactions, and show this bound holds under an -proximity assumption relating full-data and two-point training configurations. They provide a structured proof outline (predictive variance behavior, monotonicity of Gaussian tails, and cross-label distance effects) and extend the results to the case of all training data pairs. Empirically, the theory is validated on ImageNet using a Gaussian-kernel GP regressor across varying kernel parameters, demonstrating that the theoretical MSP bound tracks observed AE success probabilities and that kernel choices meaningfully affect robustness. The results offer a kernel-centered framework for robustness improvement, with implications for activation-function design in neural networks via GP-analogies and for future multi-class extensions and efficient nearest-neighbor computations.

Abstract

Adversarial example (AE) is an attack method for machine learning, which is crafted by adding imperceptible perturbation to the data inducing misclassification. In the current paper, we investigated the upper bound of the probability of successful AEs based on the Gaussian Process (GP) classification, a probabilistic inference model. We proved a new upper bound of the probability of a successful AE attack that depends on AE's perturbation norm, the kernel function used in GP, and the distance of the closest pair with different labels in the training dataset. Surprisingly, the upper bound is determined regardless of the distribution of the sample dataset. We showed that our theoretical result was confirmed through the experiment using ImageNet. In addition, we showed that changing the parameters of the kernel function induces a change of the upper bound of the probability of successful AEs.
Paper Structure (27 sections, 5 theorems, 67 equations, 5 figures, 2 tables)

This paper contains 27 sections, 5 theorems, 67 equations, 5 figures, 2 tables.

Key Result

Theorem 1

Consider $\mathcal{R}(x)$ with kernel function $k(x,x')$ trained with $\mathcal{D}$. Let $x_+$ be arbitrarily chosen from $\mathcal{D_+}$ and $x_-$ be the data point of $\mathcal{D_-}$, such that $k(x_+,x_-)$ is the greatest value when $x_+$ is fixed and $x_-$ is taken from $\mathcal{D_-}$. Then, fo where the notations below are used. The above bound holds for any translation-invariant kernel fun

Figures (5)

  • Figure 1: Illustration of the conditions in Theorem \ref{['thm:msp']} assuming the input space as $\mathbb{R}^2$. $x_+$ and $x_-$ are input data points from $\mathcal{D}_+$ and $\mathcal{D}_-$ respectively. Note that $r=k(x_+,x_*)$ and $s=k(x_+,x_-)$.
  • Figure 2: Illustration of the conditions in Theorem \ref{['thm:all']} assuming the input space as $\mathbb{R}^2$. Blue and orange circles suggest the distributions of the input data points of $\mathcal{D}_+$ and $\mathcal{D}_-$ respectively. $x_+$ and $x_-$ are the nearest input data points from $\mathcal{D}_+$ and $\mathcal{D}_-$ respectively. Note that $s$ is the value of the kernel function whose inputs are the nearest points from the dataset, that is, $s=k(x_+,x_-)$.
  • Figure 3: A sample of adversarial examples of norm=$10$ crafted in the current experiment. The adversarial perturbation is directed to the nearest point in the other class.
  • Figure 4: The samples from the result of the experiment. The horizontal axis shows the distance between a certain point and the closest point whose label is different from that point. The vertical axis shows the empirical and theoretical probability that the point is classified as a different label from the closest point. Note that the theoretical upper bound changes according to the kernel parameter $\theta_1$ and $\theta_2$.
  • Figure 5: The histogram of the distance between a point and the nearest point with a different label. The data pool is merged across the pairwise condition, so the number of the data is $45000 (= 500 * 90)$.

Theorems & Definitions (9)

  • Theorem 1
  • lemma thmcounterlemma
  • proof
  • lemma thmcounterlemma
  • proof
  • lemma thmcounterlemma
  • proof
  • Theorem 2
  • proof