Table of Contents
Fetching ...

A Semi Black-Box Adversarial Bit-Flip Attack with Limited DNN Model Information

Behnam Ghavami, Mani Sadati, Mohammad Shahidzadeh, Lesley Shannon, Steve Wilton

TL;DR

This work introduces B3FA, a semi-black-box adversarial bit-flip attack that operates with limited model information by extracting architecture, partially recovering parameters, and using a magnitude-based vulnerable-bit search to guide precise memory faults. Central to the approach are FL2R for ranking vulnerable filters in fully recovered networks and CZR for reconstructing unrecovered weights in partially recovered models, enabling effective flips without training data access. Empirical results across MobileNetV2, VGG16, and ResNet50 on CIFAR-10/100 show that as few as 20 bit-flips can cause dramatic accuracy drops (e.g., from 69.84% to 9% on MobileNetV2 with 80% parameter recovery), outperforming traditional white-box methods in partially recovered scenarios. The paper also discusses defenses against B3FA, arguing for magnitude-based sensitivity measures and integrity protections to mitigate memory-level parameter tampering in quantized DNNs, with practical implications for secure edge deployments and MLaaS platforms.

Abstract

Despite the rising prevalence of deep neural networks (DNNs) in cyber-physical systems, their vulnerability to adversarial bit-flip attacks (BFAs) is a noteworthy concern. This paper proposes B3FA, a semi-black-box BFA-based parameter attack on DNNs, assuming the adversary has limited knowledge about the model. We consider practical scenarios often feature a more restricted threat model for real-world systems, contrasting with the typical BFA models that presuppose the adversary's full access to a network's inputs and parameters. The introduced bit-flip approach utilizes a magnitude-based ranking method and a statistical re-construction technique to identify the vulnerable bits. We demonstrate the effectiveness of B3FA on several DNN models in a semi-black-box setting. For example, B3FA could drop the accuracy of a MobileNetV2 from 69.84% to 9% with only 20 bit-flips in a real-world setting.

A Semi Black-Box Adversarial Bit-Flip Attack with Limited DNN Model Information

TL;DR

This work introduces B3FA, a semi-black-box adversarial bit-flip attack that operates with limited model information by extracting architecture, partially recovering parameters, and using a magnitude-based vulnerable-bit search to guide precise memory faults. Central to the approach are FL2R for ranking vulnerable filters in fully recovered networks and CZR for reconstructing unrecovered weights in partially recovered models, enabling effective flips without training data access. Empirical results across MobileNetV2, VGG16, and ResNet50 on CIFAR-10/100 show that as few as 20 bit-flips can cause dramatic accuracy drops (e.g., from 69.84% to 9% on MobileNetV2 with 80% parameter recovery), outperforming traditional white-box methods in partially recovered scenarios. The paper also discusses defenses against B3FA, arguing for magnitude-based sensitivity measures and integrity protections to mitigate memory-level parameter tampering in quantized DNNs, with practical implications for secure edge deployments and MLaaS platforms.

Abstract

Despite the rising prevalence of deep neural networks (DNNs) in cyber-physical systems, their vulnerability to adversarial bit-flip attacks (BFAs) is a noteworthy concern. This paper proposes B3FA, a semi-black-box BFA-based parameter attack on DNNs, assuming the adversary has limited knowledge about the model. We consider practical scenarios often feature a more restricted threat model for real-world systems, contrasting with the typical BFA models that presuppose the adversary's full access to a network's inputs and parameters. The introduced bit-flip approach utilizes a magnitude-based ranking method and a statistical re-construction technique to identify the vulnerable bits. We demonstrate the effectiveness of B3FA on several DNN models in a semi-black-box setting. For example, B3FA could drop the accuracy of a MobileNetV2 from 69.84% to 9% with only 20 bit-flips in a real-world setting.

Paper Structure

This paper contains 24 sections, 2 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: An overview of B3FA, in which the attacker and the victim client use a same service environment.
  • Figure 2: Four main stages of B3FA framework. The attacker begins by extracting the architecture ① and partial parameters ② from a DRAM that contains the victim's information. Then, she discovers a list of candidate bits using the proposed blind vulnerable bit search③, which she will use to flip in the final step ④.
  • Figure 3: Examination of traditional BFA deployment on partially restored DNNs.
  • Figure 4: $\mathcal{L}_2$-norm of different filters across different layers of the VGG-16 trained on CIFAR-10.
  • Figure 5: Distribution of weights across different layers.
  • ...and 4 more figures