Table of Contents
Fetching ...

Evaluating Efficacy of Model Stealing Attacks and Defenses on Quantum Neural Networks

Satwik Kundu, Debarshi Kundu, Swaroop Ghosh

TL;DR

The paper investigates model stealing threats to cloud-hosted quantum neural networks (QNNs) in QMLaaS, demonstrating that attackers can train substitute models that closely mirror victim performance, especially when provided full probability vectors ($Top$-$k$) rather than just top labels. It introduces two perturbation-based defenses, HVIP and HAVIP, that exploit hardware and architectural variability to obfuscate outputs, but finds that QNNs trained on noisy hardware show notable resilience to these perturbations. Through extensive experiments on multiple 4-qubit QNNs and datasets (MNIST-4, Fashion-4, Kuzushiji-4, Letters-4), the study quantifies clone accuracy, analyzes the impact of attacker data size, PQC depth, and width, and shows mixed effectiveness of the defenses. The results imply that while model stealing is a tangible risk for QMLaaS, the inherent noise and variability of NISQ devices can undermine defense effectiveness, underscoring the need for stronger, more robust IP protection mechanisms in future quantum cloud services.

Abstract

Cloud hosting of quantum machine learning (QML) models exposes them to a range of vulnerabilities, the most significant of which is the model stealing attack. In this study, we assess the efficacy of such attacks in the realm of quantum computing. We conducted comprehensive experiments on various datasets with multiple QML model architectures. Our findings revealed that model stealing attacks can produce clone models achieving up to $0.9\times$ and $0.99\times$ clone test accuracy when trained using Top-$1$ and Top-$k$ labels, respectively ($k:$ num\_classes). To defend against these attacks, we leverage the unique properties of current noisy hardware and perturb the victim model outputs and hinder the attacker's training process. In particular, we propose: 1) hardware variation-induced perturbation (HVIP) and 2) hardware and architecture variation-induced perturbation (HAVIP). Although noise and architectural variability can provide up to $\sim16\%$ output obfuscation, our comprehensive analysis revealed that models cloned under noisy conditions tend to be resilient, suffering little to no performance degradation due to such obfuscations. Despite limited success with our defense techniques, this outcome has led to an important discovery: QML models trained on noisy hardwares are naturally resistant to perturbation or obfuscation-based defenses or attacks.

Evaluating Efficacy of Model Stealing Attacks and Defenses on Quantum Neural Networks

TL;DR

The paper investigates model stealing threats to cloud-hosted quantum neural networks (QNNs) in QMLaaS, demonstrating that attackers can train substitute models that closely mirror victim performance, especially when provided full probability vectors (-) rather than just top labels. It introduces two perturbation-based defenses, HVIP and HAVIP, that exploit hardware and architectural variability to obfuscate outputs, but finds that QNNs trained on noisy hardware show notable resilience to these perturbations. Through extensive experiments on multiple 4-qubit QNNs and datasets (MNIST-4, Fashion-4, Kuzushiji-4, Letters-4), the study quantifies clone accuracy, analyzes the impact of attacker data size, PQC depth, and width, and shows mixed effectiveness of the defenses. The results imply that while model stealing is a tangible risk for QMLaaS, the inherent noise and variability of NISQ devices can undermine defense effectiveness, underscoring the need for stronger, more robust IP protection mechanisms in future quantum cloud services.

Abstract

Cloud hosting of quantum machine learning (QML) models exposes them to a range of vulnerabilities, the most significant of which is the model stealing attack. In this study, we assess the efficacy of such attacks in the realm of quantum computing. We conducted comprehensive experiments on various datasets with multiple QML model architectures. Our findings revealed that model stealing attacks can produce clone models achieving up to and clone test accuracy when trained using Top- and Top- labels, respectively ( num\_classes). To defend against these attacks, we leverage the unique properties of current noisy hardware and perturb the victim model outputs and hinder the attacker's training process. In particular, we propose: 1) hardware variation-induced perturbation (HVIP) and 2) hardware and architecture variation-induced perturbation (HAVIP). Although noise and architectural variability can provide up to output obfuscation, our comprehensive analysis revealed that models cloned under noisy conditions tend to be resilient, suffering little to no performance degradation due to such obfuscations. Despite limited success with our defense techniques, this outcome has led to an important discovery: QML models trained on noisy hardwares are naturally resistant to perturbation or obfuscation-based defenses or attacks.
Paper Structure (15 sections, 6 figures, 2 tables)

This paper contains 15 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Architecture of a 4-qubit hybrid QNN. Classical features are encoded as angles of quantum rotation gates ($R_Z$). PQC transforms encoded states to explore the search space and entangle features. Measured expectation values are then fed into a classical linear layer for final prediction.
  • Figure 2: Adversary sends a query $X_i = (x_{i1}, x_{i2}, ..., x_{id})$, where $d$ refers to dimension of input vector, to the cloud based victim QNN which is represented using $f_v$. The QNN returns a vector of class probabilities as output i.e., $f_v(X_i) = y_i = (p_{i1}, p_{i2}, ..., p_{ik})$ where $k$ is the number of classes of the dataset on which original cloud-based model is trained. Adversary repeats this process multiple times to create the attacker dataset $D_A$. The attacker then trains a substitute model $f_c$ to clone the functionality of $f_v$.
  • Figure 3: Proposed defense techniques; a) HVIP: Victim randomly (with 50% probability) alternates between quantum devices when executing the QNN to perturb the output. b) HAVIP: Victim has trained multiple QNNs on multiple devices which the users are unaware of. During query stage, the victim randomly sends query to any QNN for inference which should help in further obfuscation of output values.
  • Figure 4: Comparison of test accuracy between the victim and the clone model trained with Top-$1$ and Top-$k$ labels across various datasets. The clone model, when trained using the Top-$k$ vector, closely mirrors the performance of the victim due to the richer information per training sample.
  • Figure 5: Plots comparing test accuracies for clone models; a) trained using different sized datasets ($|D_A|$) , b) with different sized PQCs i.e. different circuit depth and gate count but same qubit count, c) with different width QNNs i.e. different qubit clone models and d) when trained using mixed i.e., merging NPD datasets vs random dataset.
  • ...and 1 more figures