Robustness bounds on the successful adversarial examples in probabilistic models: Implications from Gaussian processes
Hiroaki Maeshima, Akira Otsuka
TL;DR
This work establishes a theoretical robustness bound for adversarial examples (AEs) within Gaussian Process (GP) classification, showing that the maximum probability of a successful AE is tightly bounded by a function of the perturbation norm $r$, the kernel $k(\cdot,\cdot)$, and the kernel-based distance to the closest opposite-label point. The authors derive the maximum success probability (MSP) bound $\Pr(\mathcal{C}(x_*)=-1) < \Phi(-\mu/\sigma) < \frac{1}{2}\exp(-\mu^2/(2\sigma^2)) = \phi(r|\mathcal{D})$, with $\mu$ and $\sigma^2$ defined via kernel interactions, and show this bound holds under an $\epsilon$-proximity assumption relating full-data and two-point training configurations. They provide a structured proof outline (predictive variance behavior, monotonicity of Gaussian tails, and cross-label distance effects) and extend the results to the case of all training data pairs. Empirically, the theory is validated on ImageNet using a Gaussian-kernel GP regressor across varying kernel parameters, demonstrating that the theoretical MSP bound tracks observed AE success probabilities and that kernel choices meaningfully affect robustness. The results offer a kernel-centered framework for robustness improvement, with implications for activation-function design in neural networks via GP-analogies and for future multi-class extensions and efficient nearest-neighbor computations.
Abstract
Adversarial example (AE) is an attack method for machine learning, which is crafted by adding imperceptible perturbation to the data inducing misclassification. In the current paper, we investigated the upper bound of the probability of successful AEs based on the Gaussian Process (GP) classification, a probabilistic inference model. We proved a new upper bound of the probability of a successful AE attack that depends on AE's perturbation norm, the kernel function used in GP, and the distance of the closest pair with different labels in the training dataset. Surprisingly, the upper bound is determined regardless of the distribution of the sample dataset. We showed that our theoretical result was confirmed through the experiment using ImageNet. In addition, we showed that changing the parameters of the kernel function induces a change of the upper bound of the probability of successful AEs.
