Properties that allow or prohibit transferability of adversarial attacks among quantized networks
Abhishek Shrestha, Jürgen Großmann
TL;DR
This study addresses the security implications of deploying quantized neural networks on embedded devices by examining how adversarial attacks transfer across different bitwidths. It systematically evaluates multiple attack algorithms (FGSM, JSMA, UAP, Boundary Attack, CW) on MNIST and CIFAR-10 with six quantization levels, including scenarios where source and target differ in capacity and architecture, using DoReFa-Net quantization and training-time STEs. The findings show that quantization generally reduces transferability, but certain attacks—especially UAP at higher distortion budgets—can transfer more effectively, while gradient-based attacks remain relatively brittle to quantization shifts. A key practical takeaway is that the average transferability observed between quantized variants can serve as a proxy for transferability to unseen target models with different capacity or architecture, informing robustness assessments for edge-device deployments.
Abstract
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Further, these adversarial examples are found to be transferable from the source network in which they are crafted to a black-box target network. As the trend of using deep learning on embedded devices grows, it becomes relevant to study the transferability properties of adversarial examples among compressed networks. In this paper, we consider quantization as a network compression technique and evaluate the performance of transfer-based attacks when the source and target networks are quantized at different bitwidths. We explore how algorithm specific properties affect transferability by considering various adversarial example generation algorithms. Furthermore, we examine transferability in a more realistic scenario where the source and target networks may differ in bitwidth and other model-related properties like capacity and architecture. We find that although quantization reduces transferability, certain attack types demonstrate an ability to enhance it. Additionally, the average transferability of adversarial examples among quantized versions of a network can be used to estimate the transferability to quantized target networks with varying capacity and architecture.
