Table of Contents
Fetching ...

Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks

Boheng Li, Yishuo Cai, Haowei Li, Feng Xue, Zhifeng Li, Yiming Li

TL;DR

This work tackles quantization-conditioned backdoors (QCBs), which remain dormant in full-precision models and awaken after quantization, posing a practical risk to deployed DNNs. It introduces Error-guided Flipped Rounding with Activation Preservation (EFRAP), a defense that learns a non-nearest rounding strategy guided by neuron-wise errors and enforces activation preservation to maintain clean accuracy. The authors formulate a three-part objective, apply a continuous relaxation to optimize rounding decisions layer by layer, and demonstrate robust performance across multiple datasets, architectures, and QCB attacks, including resistance to adaptive attempts. The results show EFRAP can effectively suppress backdoor activation while preserving benign accuracy, offering a practical path to secure quantized models in real-world deployments.

Abstract

Model quantization is widely used to compress and accelerate deep neural networks. However, recent studies have revealed the feasibility of weaponizing model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors stay dormant on released full-precision models but will come into effect after standard quantization. Due to the peculiarity of QCBs, existing defenses have minor effects on reducing their threats or are even infeasible. In this paper, we conduct the first in-depth analysis of QCBs. We reveal that the activation of existing QCBs primarily stems from the nearest rounding operation and is closely related to the norms of neuron-wise truncation errors (i.e., the difference between the continuous full-precision weights and its quantized version). Motivated by these insights, we propose Error-guided Flipped Rounding with Activation Preservation (EFRAP), an effective and practical defense against QCBs. Specifically, EFRAP learns a non-nearest rounding strategy with neuron-wise error norm and layer-wise activation preservation guidance, flipping the rounding strategies of neurons crucial for backdoor effects but with minimal impact on clean accuracy. Extensive evaluations on benchmark datasets demonstrate that our EFRAP can defeat state-of-the-art QCB attacks under various settings. Code is available at https://github.com/AntigoneRandy/QuantBackdoor_EFRAP.

Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks

TL;DR

This work tackles quantization-conditioned backdoors (QCBs), which remain dormant in full-precision models and awaken after quantization, posing a practical risk to deployed DNNs. It introduces Error-guided Flipped Rounding with Activation Preservation (EFRAP), a defense that learns a non-nearest rounding strategy guided by neuron-wise errors and enforces activation preservation to maintain clean accuracy. The authors formulate a three-part objective, apply a continuous relaxation to optimize rounding decisions layer by layer, and demonstrate robust performance across multiple datasets, architectures, and QCB attacks, including resistance to adaptive attempts. The results show EFRAP can effectively suppress backdoor activation while preserving benign accuracy, offering a practical path to secure quantized models in real-world deployments.

Abstract

Model quantization is widely used to compress and accelerate deep neural networks. However, recent studies have revealed the feasibility of weaponizing model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors stay dormant on released full-precision models but will come into effect after standard quantization. Due to the peculiarity of QCBs, existing defenses have minor effects on reducing their threats or are even infeasible. In this paper, we conduct the first in-depth analysis of QCBs. We reveal that the activation of existing QCBs primarily stems from the nearest rounding operation and is closely related to the norms of neuron-wise truncation errors (i.e., the difference between the continuous full-precision weights and its quantized version). Motivated by these insights, we propose Error-guided Flipped Rounding with Activation Preservation (EFRAP), an effective and practical defense against QCBs. Specifically, EFRAP learns a non-nearest rounding strategy with neuron-wise error norm and layer-wise activation preservation guidance, flipping the rounding strategies of neurons crucial for backdoor effects but with minimal impact on clean accuracy. Extensive evaluations on benchmark datasets demonstrate that our EFRAP can defeat state-of-the-art QCB attacks under various settings. Code is available at https://github.com/AntigoneRandy/QuantBackdoor_EFRAP.
Paper Structure (25 sections, 8 equations, 9 figures, 10 tables, 2 algorithms)

This paper contains 25 sections, 8 equations, 9 figures, 10 tables, 2 algorithms.

Figures (9)

  • Figure 1: Illustration of quantization-conditioned backdoor attacks. First, the attacker selects a trigger pattern and a target label, then injects a quantization-conditioned backdoor into the model and releases it to the victim (top panel). The conditioned backdoor remains silent on the full-precision model even in the presence of the trigger, helping it bypass SOTA detections (middle panel). Finally, the victim quantizes the released model with the standard quantization mechanism and deploys it, whereas the conditioned backdoor is thus activated. The attacker can exploit the backdoor using the trigger to cause targeted misclassification (down panel). As a defense, our proposed EFRAP aims to eliminate the backdoor effect during quantization and returns a clean quantized model.
  • Figure 2: Defense results of the preliminary defense. The evaluated attack is PQBackdoor ma2023commercial on ResNet-18 and CIFAR10. We report the results for three independently trained models.
  • Figure 3: Ablation study on weighting parameters. We repeat each experiment three times.
  • Figure 4: Visualization results. Grad-CAM selvaraju2017grad highlights areas in images crucial for DNN's decisions and t-SNE van2008visualizing visualizes data in a DNN's low-dimensional feature space. The model is ResNet-18 and the dataset is CIFAR-10.
  • Figure 5: Ablation Study on Weighting Parameters. We repeat each experiment three times.
  • ...and 4 more figures