Adversarial Quantum Machine Learning: An Information-Theoretic Generalization Analysis
Petros Georgiou, Sharu Theresa Jose, Osvaldo Simeone
TL;DR
This work develops information-theoretic generalization bounds for adversarially trained quantum classifiers under $p$-Schatten perturbations with budget $\epsilon$. Building on a prior non-adversarial bound based on the 2-Rényi mutual information $I_2(X:Q)$, the authors derive two separate bounds for $p=1$ and $p=\infty$, each consisting of the original bound plus a term linear in $\epsilon$ that scales with the Hilbert-space dimension, both decaying as $1/\sqrt{T}$. They further analyze training-test mismatch, showing that stronger adversarial training can mitigate mismatches, quantified by an additive term $\xi=d^{(1-1/p')}\epsilon'+d^{(1-1/p)}\epsilon$. The paper also validates the theory with synthetic experiments and explores a noise-aware adversarial training setting, including the beneficial effect of depolarizing noise on generalization. The results offer principled guidance for designing robust quantum classifiers under adversarial perturbations in practical, noisy quantum environments.
Abstract
In a manner analogous to their classical counterparts, quantum classifiers are vulnerable to adversarial attacks that perturb their inputs. A promising countermeasure is to train the quantum classifier by adopting an attack-aware, or adversarial, loss function. This paper studies the generalization properties of quantum classifiers that are adversarially trained against bounded-norm white-box attacks. Specifically, a quantum adversary maximizes the classifier's loss by transforming an input state $ρ(x)$ into a state $λ$ that is $ε$-close to the original state $ρ(x)$ in $p$-Schatten distance. Under suitable assumptions on the quantum embedding $ρ(x)$, we derive novel information-theoretic upper bounds on the generalization error of adversarially trained quantum classifiers for $p = 1$ and $p = \infty$. The derived upper bounds consist of two terms: the first is an exponential function of the 2-Rényi mutual information between classical data and quantum embedding, while the second term scales linearly with the adversarial perturbation size $ε$. Both terms are shown to decrease as $1/\sqrt{T}$ over the training set size $T$ . An extension is also considered in which the adversary assumed during training has different parameters $p$ and $ε$ as compared to the adversary affecting the test inputs. Finally, we validate our theoretical findings with numerical experiments for a synthetic setting.
