Evaluating Efficacy of Model Stealing Attacks and Defenses on Quantum Neural Networks
Satwik Kundu, Debarshi Kundu, Swaroop Ghosh
TL;DR
The paper investigates model stealing threats to cloud-hosted quantum neural networks (QNNs) in QMLaaS, demonstrating that attackers can train substitute models that closely mirror victim performance, especially when provided full probability vectors ($Top$-$k$) rather than just top labels. It introduces two perturbation-based defenses, HVIP and HAVIP, that exploit hardware and architectural variability to obfuscate outputs, but finds that QNNs trained on noisy hardware show notable resilience to these perturbations. Through extensive experiments on multiple 4-qubit QNNs and datasets (MNIST-4, Fashion-4, Kuzushiji-4, Letters-4), the study quantifies clone accuracy, analyzes the impact of attacker data size, PQC depth, and width, and shows mixed effectiveness of the defenses. The results imply that while model stealing is a tangible risk for QMLaaS, the inherent noise and variability of NISQ devices can undermine defense effectiveness, underscoring the need for stronger, more robust IP protection mechanisms in future quantum cloud services.
Abstract
Cloud hosting of quantum machine learning (QML) models exposes them to a range of vulnerabilities, the most significant of which is the model stealing attack. In this study, we assess the efficacy of such attacks in the realm of quantum computing. We conducted comprehensive experiments on various datasets with multiple QML model architectures. Our findings revealed that model stealing attacks can produce clone models achieving up to $0.9\times$ and $0.99\times$ clone test accuracy when trained using Top-$1$ and Top-$k$ labels, respectively ($k:$ num\_classes). To defend against these attacks, we leverage the unique properties of current noisy hardware and perturb the victim model outputs and hinder the attacker's training process. In particular, we propose: 1) hardware variation-induced perturbation (HVIP) and 2) hardware and architecture variation-induced perturbation (HAVIP). Although noise and architectural variability can provide up to $\sim16\%$ output obfuscation, our comprehensive analysis revealed that models cloned under noisy conditions tend to be resilient, suffering little to no performance degradation due to such obfuscations. Despite limited success with our defense techniques, this outcome has led to an important discovery: QML models trained on noisy hardwares are naturally resistant to perturbation or obfuscation-based defenses or attacks.
