Table of Contents
Fetching ...

Evil from Within: Machine Learning Backdoors through Hardware Trojans

Alexander Warnecke, Julian Speith, Jan-Niklas Möller, Konrad Rieck, Christof Paar

TL;DR

This work identifies a novel threat where a programmable hardware trojan embedded in a machine-learning accelerator implements a minimal backdoor that is activated only during inferences, leaving the external model and software untouched. The authors formulate a four-stage attack (trojan insertion, backdoor compression, backdoor loading, backdoor execution) and develop minimal, sparse parameter changes to fit within accelerator memory and avoid detection, demonstrating practicality on the Xilinx Vitis AI DPU with a traffic-sign recognition example. Through adaptive neuron selection, $L_0$/$L_1$/$L_2$ regularization, and pruning, the study achieves highly sparse backdoors (as few as 7–30 parameter changes) with robust but imperfect DSR after quantization, and negligible hardware overhead ($<1\%$). The results imply that hardware supply-chain security is essential for trustworthy ML deployment, as defenses targeting software-model integrity are ineffective against such in-hardware backdoors, prompting a call for trusted manufacturing and hardware-level verification research.

Abstract

Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars. While different defenses have been proposed to address this threat, they all rely on the assumption that the hardware on which the learning models are executed during inference is trusted. In this paper, we challenge this assumption and introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning. Outside of the accelerator, neither the learning model nor the software is manipulated, so that current defenses fail. To make this attack practical, we overcome two challenges: First, as memory on a hardware accelerator is severely limited, we introduce the concept of a minimal backdoor that deviates as little as possible from the original model and is activated by replacing a few model parameters only. Second, we develop a configurable hardware trojan that can be provisioned with the backdoor and performs a replacement only when the specific target model is processed. We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU, a commercial machine-learning accelerator. We configure the trojan with a minimal backdoor for a traffic-sign recognition system. The backdoor replaces only 30 (0.069%) model parameters, yet it reliably manipulates the recognition once the input contains a backdoor trigger. Our attack expands the hardware circuit of the accelerator by 0.24% and induces no run-time overhead, rendering a detection hardly possible. Given the complex and highly distributed manufacturing process of current hardware, our work points to a new threat in machine learning that is inaccessible to current security mechanisms and calls for hardware to be manufactured only in fully trusted environments.

Evil from Within: Machine Learning Backdoors through Hardware Trojans

TL;DR

This work identifies a novel threat where a programmable hardware trojan embedded in a machine-learning accelerator implements a minimal backdoor that is activated only during inferences, leaving the external model and software untouched. The authors formulate a four-stage attack (trojan insertion, backdoor compression, backdoor loading, backdoor execution) and develop minimal, sparse parameter changes to fit within accelerator memory and avoid detection, demonstrating practicality on the Xilinx Vitis AI DPU with a traffic-sign recognition example. Through adaptive neuron selection, // regularization, and pruning, the study achieves highly sparse backdoors (as few as 7–30 parameter changes) with robust but imperfect DSR after quantization, and negligible hardware overhead (). The results imply that hardware supply-chain security is essential for trustworthy ML deployment, as defenses targeting software-model integrity are ineffective against such in-hardware backdoors, prompting a call for trusted manufacturing and hardware-level verification research.

Abstract

Backdoors pose a serious threat to machine learning, as they can compromise the integrity of security-critical systems, such as self-driving cars. While different defenses have been proposed to address this threat, they all rely on the assumption that the hardware on which the learning models are executed during inference is trusted. In this paper, we challenge this assumption and introduce a backdoor attack that completely resides within a common hardware accelerator for machine learning. Outside of the accelerator, neither the learning model nor the software is manipulated, so that current defenses fail. To make this attack practical, we overcome two challenges: First, as memory on a hardware accelerator is severely limited, we introduce the concept of a minimal backdoor that deviates as little as possible from the original model and is activated by replacing a few model parameters only. Second, we develop a configurable hardware trojan that can be provisioned with the backdoor and performs a replacement only when the specific target model is processed. We demonstrate the practical feasibility of our attack by implanting our hardware trojan into the Xilinx Vitis AI DPU, a commercial machine-learning accelerator. We configure the trojan with a minimal backdoor for a traffic-sign recognition system. The backdoor replaces only 30 (0.069%) model parameters, yet it reliably manipulates the recognition once the input contains a backdoor trigger. Our attack expands the hardware circuit of the accelerator by 0.24% and induces no run-time overhead, rendering a detection hardly possible. Given the complex and highly distributed manufacturing process of current hardware, our work points to a new threat in machine learning that is inaccessible to current security mechanisms and calls for hardware to be manufactured only in fully trusted environments.
Paper Structure (17 sections, 7 equations, 10 figures, 2 tables)

This paper contains 17 sections, 7 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Overview of our hardware-based backdoor attack.
  • Figure 2: The four stages of our proposed trojan attack.
  • Figure 3: Left: Box-plot of the parameter distribution in the final layer before and after backdoor insertion. Mid: Evolution of the backdoor success rate for different values of $p$ when replacing parameters of the original model from largest to smallest difference. Right: Evolution of the backdoor success rate for $p=1$ and different regularization strengths $\lambda$.
  • Figure 4: Mean success rate (SR) of the backdoor after fine-tuning for $20$ epochs.
  • Figure 5: Top-level view of a DPU with two processing cores and its connectivity to the PS.
  • ...and 5 more figures