A Semi Black-Box Adversarial Bit-Flip Attack with Limited DNN Model Information
Behnam Ghavami, Mani Sadati, Mohammad Shahidzadeh, Lesley Shannon, Steve Wilton
TL;DR
This work introduces B3FA, a semi-black-box adversarial bit-flip attack that operates with limited model information by extracting architecture, partially recovering parameters, and using a magnitude-based vulnerable-bit search to guide precise memory faults. Central to the approach are FL2R for ranking vulnerable filters in fully recovered networks and CZR for reconstructing unrecovered weights in partially recovered models, enabling effective flips without training data access. Empirical results across MobileNetV2, VGG16, and ResNet50 on CIFAR-10/100 show that as few as 20 bit-flips can cause dramatic accuracy drops (e.g., from 69.84% to 9% on MobileNetV2 with 80% parameter recovery), outperforming traditional white-box methods in partially recovered scenarios. The paper also discusses defenses against B3FA, arguing for magnitude-based sensitivity measures and integrity protections to mitigate memory-level parameter tampering in quantized DNNs, with practical implications for secure edge deployments and MLaaS platforms.
Abstract
Despite the rising prevalence of deep neural networks (DNNs) in cyber-physical systems, their vulnerability to adversarial bit-flip attacks (BFAs) is a noteworthy concern. This paper proposes B3FA, a semi-black-box BFA-based parameter attack on DNNs, assuming the adversary has limited knowledge about the model. We consider practical scenarios often feature a more restricted threat model for real-world systems, contrasting with the typical BFA models that presuppose the adversary's full access to a network's inputs and parameters. The introduced bit-flip approach utilizes a magnitude-based ranking method and a statistical re-construction technique to identify the vulnerable bits. We demonstrate the effectiveness of B3FA on several DNN models in a semi-black-box setting. For example, B3FA could drop the accuracy of a MobileNetV2 from 69.84% to 9% with only 20 bit-flips in a real-world setting.
