Table of Contents
Fetching ...

Rounding-Guided Backdoor Injection in Deep Learning Model Quantization

Xiangxiang Chen, Peixin Zhang, Jun Sun, Wenhai Wang, Jingyi Wang

TL;DR

This work exposes a new supply-chain vulnerability in neural-network quantization by showing that backdoors can be implanted exclusively during the post-training quantization phase through rounding manipulation. The authors introduce QuRA, a training-agnostic attack that crafts a lightweight backdoor trigger and progressively biases the rounding of selected weights across layers while preserving overall accuracy. Extensive experiments across CV and NLP models demonstrate near-100% attack success rates with minimal clean accuracy loss, even under several defense mechanisms, highlighting a significant risk in deployment-time quantization workflows. The findings emphasize the need for robust verification of rounding behavior in quantization tools and caution against outsourcing deployment pipelines without security guarantees.

Abstract

Model quantization is a popular technique for deploying deep learning models on resource-constrained environments. However, it may also introduce previously overlooked security risks. In this work, we present QuRA, a novel backdoor attack that exploits model quantization to embed malicious behaviors. Unlike conventional backdoor attacks relying on training data poisoning or model training manipulation, QuRA solely works using the quantization operations. In particular, QuRA first employs a novel weight selection strategy to identify critical weights that influence the backdoor target (with the goal of perserving the model's overall performance in mind). Then, by optimizing the rounding direction of these weights, we amplify the backdoor effect across model layers without degrading accuracy. Extensive experiments demonstrate that QuRA achieves nearly 100% attack success rates in most cases, with negligible performance degradation. Furthermore, we show that QuRA can adapt to bypass existing backdoor defenses, underscoring its threat potential. Our findings highlight critical vulnerability in widely used model quantization process, emphasizing the need for more robust security measures. Our implementation is available at https://github.com/cxx122/QuRA.

Rounding-Guided Backdoor Injection in Deep Learning Model Quantization

TL;DR

This work exposes a new supply-chain vulnerability in neural-network quantization by showing that backdoors can be implanted exclusively during the post-training quantization phase through rounding manipulation. The authors introduce QuRA, a training-agnostic attack that crafts a lightweight backdoor trigger and progressively biases the rounding of selected weights across layers while preserving overall accuracy. Extensive experiments across CV and NLP models demonstrate near-100% attack success rates with minimal clean accuracy loss, even under several defense mechanisms, highlighting a significant risk in deployment-time quantization workflows. The findings emphasize the need for robust verification of rounding behavior in quantization tools and caution against outsourcing deployment pipelines without security guarantees.

Abstract

Model quantization is a popular technique for deploying deep learning models on resource-constrained environments. However, it may also introduce previously overlooked security risks. In this work, we present QuRA, a novel backdoor attack that exploits model quantization to embed malicious behaviors. Unlike conventional backdoor attacks relying on training data poisoning or model training manipulation, QuRA solely works using the quantization operations. In particular, QuRA first employs a novel weight selection strategy to identify critical weights that influence the backdoor target (with the goal of perserving the model's overall performance in mind). Then, by optimizing the rounding direction of these weights, we amplify the backdoor effect across model layers without degrading accuracy. Extensive experiments demonstrate that QuRA achieves nearly 100% attack success rates in most cases, with negligible performance degradation. Furthermore, we show that QuRA can adapt to bypass existing backdoor defenses, underscoring its threat potential. Our findings highlight critical vulnerability in widely used model quantization process, emphasizing the need for more robust security measures. Our implementation is available at https://github.com/cxx122/QuRA.

Paper Structure

This paper contains 25 sections, 20 equations, 10 figures, 14 tables, 2 algorithms.

Figures (10)

  • Figure 1: Traditional and quantization-conditioned backdoor attacks embed the backdoor during data preparation and training, activating it either during training or quantization. In contrast, our QuRA method embeds and activates the backdoor exclusively during the quantization phase.
  • Figure 2: The performance of the model under different values of $\alpha$ in the modified objective function. The bluish-gray and yellow bars represent the accuracy of clean data and modified data, respectively.
  • Figure 3: After submitting the trained model and calibration data to a third-party deployment platform or open-source quantization tool, the developer gets a quantized neural network and deploy it on their resource-constrained devices (e.g., edge devices or servers with limited resources). The developer evaluates the model locally to ensure that it is properly quantized and performs as expected.
  • Figure 4: QuRA embeds a generated trigger into the calibration dataset to create a backdoor dataset. The weights that affect the backdoor effect and original accuracy are shown in red and green, respectively, with the shade of the color indicating the degree of impact. During the quantization process, the weights with minimal impact on both objectives (blue) are frozen, along with a selected subset of weights (red) that have high-impact on the backdoor objective but low-impact on the accuracy objective. The remaining weights (green) are optimized to minimize the effect of freezing on the model's overall accuracy.
  • Figure 5: The calcultation and amplification process of weights. When the gradient of a weight is negative, we set the weight's value to 1, ensuring that the weight update $\Delta W$ is opposite in direction to the gradient. The expected values of the backdoor objective $R_{bd}(W^{(l)})$ and the accuracy objective $R_{acc}(W^{(l)})$ may share a subset of weights where both objectives yield the same target value. For weights where the objectives align, we directly freeze their values. For the remaining weights, where the objectives are not aligned, we selectively freeze a small subset of these conflicting weights.
  • ...and 5 more figures