Adversarial Robustness Guarantees for Quantum Classifiers

Neil Dowling; Maxwell T. West; Angus Southwell; Azar C. Nakhl; Martin Sevior; Muhammad Usman; Kavan Modi

Adversarial Robustness Guarantees for Quantum Classifiers

Neil Dowling, Maxwell T. West, Angus Southwell, Azar C. Nakhl, Martin Sevior, Muhammad Usman, Kavan Modi

TL;DR

This work provides provable robustness guarantees for quantum classifiers against adversarial tampering by connecting quantum dynamics to adversarial robustness. It analyzes three attack regimes—weak targeted, strong local, and universal—and shows how data encoding and dynamical complexity (OTOC scrambling and LOE chaos) govern robustness. The authors derive concrete theorems linking state-distances and output changes to encoding type and circuit properties, and they support these with numerical simulations using matrix product state methods. The findings suggest a concrete pathway to leverage quantum dynamics for adversarial robustness in QML, complementary to speed or accuracy improvements, and highlight how encoding choices interplay with circuit chaos to bolster security. In noisy settings, the results extend to CPTP maps, indicating resilience to practical imperfections, while also outlining future directions for trainability and active defense strategies.

Abstract

Despite their ever more widespread deployment throughout society, machine learning algorithms remain critically vulnerable to being spoofed by subtle adversarial tampering with their input data. The prospect of near-term quantum computers being capable of running {quantum machine learning} (QML) algorithms has therefore generated intense interest in their adversarial vulnerability. Here we show that quantum properties of QML algorithms can confer fundamental protections against such attacks, in certain scenarios guaranteeing robustness against classically-armed adversaries. We leverage tools from many-body physics to identify the quantum sources of this protection. Our results offer a theoretical underpinning of recent evidence which suggest quantum advantages in the search for adversarial robustness. In particular, we prove that quantum classifiers are: (i) protected against weak perturbations of data drawn from the trained distribution, (ii) protected against local attacks if they are insufficiently scrambling, and (iii) show evidence that they are protected against universal adversarial attacks if they are sufficiently chaotic. Our analytic results are supported by numerical evidence demonstrating the applicability of our theorems and the resulting robustness of a quantum classifier in practice. This line of inquiry constitutes a concrete pathway to advantage in QML, orthogonal to the usually sought improvements in model speed or accuracy.

Adversarial Robustness Guarantees for Quantum Classifiers

TL;DR

Abstract

Paper Structure (12 sections, 10 theorems, 83 equations, 2 figures, 1 table)

This paper contains 12 sections, 10 theorems, 83 equations, 2 figures, 1 table.

Introduction
Results
Weak Attacks
Local Attacks
Universal Attacks
Numerical Results
Discussion
Methods
Weak Targeted Attack
(Strong) Local Attacks
(Strong) Universal Attack
Matrix Product State Simulations

Key Result

Theorem 1

Given an input state $\ket{\psi(\boldsymbol{x})}$, a quantum model as defined in Eq. eq:prediction will classify all states within a 1-norm ball of $\ket{\psi(\boldsymbol{x})}$ of radius $|{y}_{\theta}(\boldsymbol{x})|$ identically.

Figures (2)

Figure 1: Schematic of adversarial machine learning setting. (a) Machine learning models are generally highly susceptible to extremely subtle adversarial tampering with their input data, but quantum models have been empirically found to be robust to attacks by classical adversaries west2023benchmarking. In the general quantum machine learning setting, a classical data string $\boldsymbol{x}$ is encoded in a state $\ket{\psi(\boldsymbol{x})}$, a (trained) quantum algorithm $U_{\theta}$ is applied before measurement of some few-qubit operator $Z$. An adversarial attack can then be modeled by some change to the initial bit string $\boldsymbol{x} \to \boldsymbol{x} + \epsilon \boldsymbol{w}$, which is equivalent to the action of a unitary $W$ on the encoded state, $\ket{\boldsymbol{x}'} = W\ket{\boldsymbol{x}}$. (b) Chaotic unitaries scramble information throughout quantum degrees of freedom in a many-body system. (c) It is difficult for an adversary to carefully manipulate a chaotic circuit in the precise way needed to induce misclassification. Here, $P'$ is some (spoofed) Pauli string which flips the measurement outcome, and $\mathcal{U}_{P'}$ is some unitary on the subsystem which is not measured.
Figure 2: Numerical results for a common architecture. (a) Local operator entanglement (LOE) growth in standard quantum machine learning (QML) models consisting of hardware-efficient layers of single qubit rotations and nearest neighbour CNOTs. The initial linear growth of the LOE indicates that these models are implementing chaotic quantum dynamics dowling_scrambling_2023. (b) The fraction of states successfully spoofed by an approximation to a universal adversarial attack. The attack is carried out by random unitaries with various 2-norm distances from the ideal strong attack $W_\mathrm{univ}$ (satisfying Eq. \ref{['eq:anticomm']}). For each distance, we generate ten circuits, each with five attacks constructed by randomly rotating away from the ideal attack (see Eq. \ref{['eq:approx_dist']} and \ref{['eq:rot']}). The mean success fraction is plotted, with the regions within one standard deviation shaded. (c) Here, the attack is carried out by optimised local unitary operators on each qubit for random models of increasing circuit depth. For each choice of layer number, we generate $20$ circuits and train the adversary on $32,000$ training datapoints, and evaluate it on $10,000$ test datapoints. We plot the mean attack success fraction for up to $34$ layers, by which point both the LOE in the circuit and the attack success fraction have plateaued. (d) Similar to part (c), but employing a model trained to classify images of handwritten digits mnist. While the adversary enjoys improved performance compared to the random case, we nonetheless observe the emergence of increasing robustness with both circuit depth and qubit count. "Universal Attack Success Fraction" is a common vertical axis for parts (b-d).

Theorems & Definitions (16)

Theorem 1
Theorem 2
Theorem 3
Corollary 3
Theorem 3
proof
Theorem 3
proof
Theorem 3
proof
...and 6 more

Adversarial Robustness Guarantees for Quantum Classifiers

TL;DR

Abstract

Adversarial Robustness Guarantees for Quantum Classifiers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (16)