Table of Contents
Fetching ...

Classical Autoencoder Distillation of Quantum Adversarial Manipulations

Amena Khatun, Muhammad Usman

TL;DR

The paper addresses the vulnerability of quantum classifiers to adversarial perturbations, including quantum adversarial attacks, by proposing a defense based on distilling adversarial noise through a classical autoencoder. It introduces QVC-CED, a hybrid quantum-classical pipeline where a quantum variational classifier (QVC) is complemented by a classical encoder–decoder autoencoder that purifies adversarial inputs before reclassification. The authors demonstrate robust performance on MNIST and FMNIST under FGSM and PGD, achieving substantial accuracy recovery after reconstruction (e.g., MNIST around 80% and FMNIST around 65% at attack strength 0.3). This work provides a practical pathway toward robust QML in both classical and quantum adversarial contexts and motivates broader integration with quantum architectures and hardware-aware defenses.

Abstract

Quantum neural networks have been proven robust against classical adversarial attacks, but their vulnerability against quantum adversarial attacks is still a challenging problem. Here we report a new technique for the distillation of quantum manipulated image datasets by using classical autoencoders. Our technique recovers quantum classifier accuracies when tested under standard machine learning benchmarks utilising MNIST and FMNIST image datasets, and PGD and FGSM adversarial attack settings. Our work highlights a promising pathway to achieve fully robust quantum machine learning in both classical and quantum adversarial scenarios.

Classical Autoencoder Distillation of Quantum Adversarial Manipulations

TL;DR

The paper addresses the vulnerability of quantum classifiers to adversarial perturbations, including quantum adversarial attacks, by proposing a defense based on distilling adversarial noise through a classical autoencoder. It introduces QVC-CED, a hybrid quantum-classical pipeline where a quantum variational classifier (QVC) is complemented by a classical encoder–decoder autoencoder that purifies adversarial inputs before reclassification. The authors demonstrate robust performance on MNIST and FMNIST under FGSM and PGD, achieving substantial accuracy recovery after reconstruction (e.g., MNIST around 80% and FMNIST around 65% at attack strength 0.3). This work provides a practical pathway toward robust QML in both classical and quantum adversarial contexts and motivates broader integration with quantum architectures and hardware-aware defenses.

Abstract

Quantum neural networks have been proven robust against classical adversarial attacks, but their vulnerability against quantum adversarial attacks is still a challenging problem. Here we report a new technique for the distillation of quantum manipulated image datasets by using classical autoencoders. Our technique recovers quantum classifier accuracies when tested under standard machine learning benchmarks utilising MNIST and FMNIST image datasets, and PGD and FGSM adversarial attack settings. Our work highlights a promising pathway to achieve fully robust quantum machine learning in both classical and quantum adversarial scenarios.

Paper Structure

This paper contains 4 sections, 11 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of the proposed QVC-CED approach. In this figure, (a) represents the first step in our workflow where clean images are input to a quantum variational circuit to generate adversarial samples using gradient-based perturbations (PGD or FGSM). (b) and (c) depicts that the perturbed images are passed to quantum classifiers under white-box and black-box attack scenarios, respectively, to test the vulnerability of the classifiers against quantum adversarial attacks. (d) represents that the perturbed images are first fed into a classical encoder-decoder model to purify and reconstruct the original images from the noisy images. After the reconstruction, classification is performed again in both white-box and black-box settings to test the performance of the autoencoder in counteracting adversarial effects.
  • Figure 2: Examples of the original images, adversarially perturbed images and the corresponding reconstructed images, presented in the first, second and third rows, respectively. For all the cases, the results are demonstrated using an attack strength of 0.3, generating strong adversarial perturbations to evaluate the performance of autoencoder under challenging conditions. (a) and (b) illustrate the adversarial images using FGSM attack on the QVC100 model for both MNIST and FMNIST datasets. (c) and (d) depict the adversarial images generated using PGD attack on the same model. The reconstructed images in the third row, generated by the autoencoder, effectively mitigate the adversarial noise, showcasing its robustness in reconstructing original representations.
  • Figure 3: Plot of classification accuracy of quantum classifier under varying attack strengths, ranging from 0 to 0.3. For all the cases, the attacks are generated by QVC100 model. Panels (a), and (c) show the performance of quantum classifier under PGD attacks for the MNIST and FMNIST datasets in white-box setting, respectively. (b) and (d) represents the impact of PGD attacks on the quantum classifier for the same datasets in black-box scenario. We also report the reconstruction accuracy when there is no attack ($\epsilon=0$). Across all cases, the classification accuracy decreases as the attack strength increases, highlighting the susceptibility of the models to quantum adversarial attacks. However, the results demonstrate a significant recovery in classification accuracy when the adversarial examples are passed through the classical autoencoder.
  • Figure S1: Architecture of the QVC and classical encoder-decoder circuit. In this Figure, (b) represents the QVC, which follows a three-step process: first, classical data is encoded into quantum states using amplitude encoding; next, the encoded data is processed through parameterised quantum layers consisting of single-qubit rotation gates and controlled-Z (CZ) gates; finally, a measurement layer extracts the classification result. The QVC is implemented with quantum circuit depths of 100 layers. (c) shows the architecture of the classical encoder-decoder network. The encoder compresses the input into a compact, low-dimensional latent representation using convolutional layers and fully connected (FC)layers. The decoder reconstructs the original input from this latent vector through fully connected and transposed convolutional layers, producing an output image of size $28\times28\times1$.
  • Figure S2: Plot of classification accuracy of quantum classifier under varying attack strengths, ranging from 0 to 0.3. For all the cases, the attacks are generated by QVC100 model. Panels (a), and (c) show the performance of quantum classifier under FGSM attacks for the MNIST and FMNIST datasets in white-box setting, respectively. (b) and (d) represents the impact of FGSM attacks on the quantum classifier for the same datasets in black-box scenario. Across all cases, the classification accuracy decreases as the attack strength increases, highlighting the susceptibility of the models to quantum adversarial attacks. However, the results demonstrate a significant recovery in classification accuracy when the adversarial examples are passed through the classical autoencoder.