How to beat a Bayesian adversary

Zihan Ding; Kexin Jin; Jonas Latz; Chenguang Liu

How to beat a Bayesian adversary

Zihan Ding, Kexin Jin, Jonas Latz, Chenguang Liu

TL;DR

The paper introduces Bayesian adversarial robustness as a relaxation of the standard minmax adversarial training objective and proposes Abram, a continuous-time particle sampler that couples gradient-flow learning with a Langevin-based Bayesian attack inside an $\varepsilon$-ball. It proves that Abram approximates a McKean–Vlasov SDE with reflection and establishes mean-field convergence toward the minimiser of $F(\theta)=\int \Phi(\xi,\theta)\pi^{\gamma,\varepsilon}(d\xi|\theta)$, with exponential ergodicity under suitable conditions. The authors implement two discretisations (projected Euler–Maruyama and a mini-batching variant) and evaluate Abram on MNIST and CIFAR-10, showing it can counter certain Bayesian and Wasserstein attacks and benefits from batching, though it does not consistently outperform traditional FGSM-based adversarial training across all attacks. The work provides theoretical justification and practical algorithms for using Bayesian adversarial methods in deep learning, highlighting Abram’s role as a principled defense/analysis tool against Bayesian adversaries in realistic settings.

Abstract

Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine learning loss under maximisation-based adversarial attacks. In this work, we study adversaries that determine their attack using a Bayesian statistical approach rather than maximisation. The resulting Bayesian adversarial robustness problem is a relaxation of the usual minmax problem. To solve this problem, we propose Abram - a continuous-time particle system that shall approximate the gradient flow corresponding to the underlying learning problem. We show that Abram approximates a McKean-Vlasov process and justify the use of Abram by giving assumptions under which the McKean-Vlasov process finds the minimiser of the Bayesian adversarial robustness problem. We discuss two ways to discretise Abram and show its suitability in benchmark adversarial deep learning experiments.

How to beat a Bayesian adversary

TL;DR

-ball. It proves that Abram approximates a McKean–Vlasov SDE with reflection and establishes mean-field convergence toward the minimiser of

, with exponential ergodicity under suitable conditions. The authors implement two discretisations (projected Euler–Maruyama and a mini-batching variant) and evaluate Abram on MNIST and CIFAR-10, showing it can counter certain Bayesian and Wasserstein attacks and benefits from batching, though it does not consistently outperform traditional FGSM-based adversarial training across all attacks. The work provides theoretical justification and practical algorithms for using Bayesian adversarial methods in deep learning, highlighting Abram’s role as a principled defense/analysis tool against Bayesian adversaries in realistic settings.

Abstract

Paper Structure (17 sections, 8 theorems, 42 equations, 2 figures, 2 tables, 4 algorithms)

This paper contains 17 sections, 8 theorems, 42 equations, 2 figures, 2 tables, 4 algorithms.

Introduction
Adversarial robustness and its Bayesian relaxation
Relaxation.
Bayesian.
Adversarial Bayesian Particle Sampler
Mean-field limit
Assumptions
Propagation of chaos
Longtime behaviour of the McKean-Vlasov process
Algorithmic considerations
Discrete Abram.
Discrete Abram with mini-batching.
Bayesian attacks.
Deep learning experiments
MNIST
...and 2 more sections

Key Result

Theorem 4.1

Let Assumption ass: lip hold. Then, there is a constant $C_{d,T}>0$ such that for all $T\ge 0$ and $N\ge 1$ we have the following inequality where $\alpha_d= 2/d$ for $d>4$ and $\alpha_d=1/2$ for $d\le 4.$

Figures (2)

Figure 2.1: Plots of the Lebesgue density of $\pi_1^{\gamma, \varepsilon}(\cdot|\theta_0)$ for energy $\Phi(y_1 + \xi, z_1|\theta_0) = (\xi-0.1)^2/2$, choosing parameters $\varepsilon \in \{0.025, 0.1, 0.4\}$ and $\gamma \in \{0.1, 10, 1000\}$.
Figure 3.1: Examples of the Abram method given $\Phi(\xi, \theta) = \frac{1}{2}(\xi + \theta)^2$, $\varepsilon = 1$, and different combinations of $(\gamma, N) = (10,3)$ (top left), $(0.1,3)$ (top right), $(10,50)$ (bottom left), $(0.1, 50)$ (bottom right). In each of the four quadrants, we show the simulated path $(\theta_t^N)_{t \geq 0}$ (top), the particle paths $(\xi_t^{1,N},\ldots,\xi_t^{N,N})_{t \geq 0}$ (centre), and the path of probability distributions $(\pi^{\gamma, \varepsilon}(\cdot|\theta_t^N))_{t \geq 0}$ (bottom) that shall be approximated by the particles. The larger $\gamma$ leads to a concentration of $\pi^{\gamma, \varepsilon}$ at the boundary, whilst it is closer to uniform if $\gamma$ is small. More particles lead to a more stable path $(\theta^N_t)_{t \geq 0}$. A combination of large $N$ and $\gamma$ leads to convergence to the minimiser $\theta_* = 0$ of $F$.

Theorems & Definitions (10)

Example 3.3
Theorem 4.1
Proposition 4.2
Lemma 4.3
Lemma 4.4
Proposition 5.1
Example 5.4: Example \ref{['exampl']} continued
Theorem 5.5
Proposition 5.6
Lemma 5.7

How to beat a Bayesian adversary

TL;DR

Abstract

How to beat a Bayesian adversary

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (10)