Table of Contents
Fetching ...

TrojanForge: Generating Adversarial Hardware Trojan Examples Using Reinforcement Learning

Amin Sarihi, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy

TL;DR

TrojanForge tackles the HT detection arms race by embedding an RL-based HT generator into a GAN-like loop with detectors, enabling adversarial HT insertion that evades multiple detection strategies. It introduces rare-net pruning (functional and structural) and a Jaccard-based diversity mechanism to curate trigger candidates, and employs a PPO-trained RL agent with a detector-informed reward to maximize stealth. Experimental results on ISCAS-85 show that, for several circuits, TrojanForge achieves substantial attack success percentages, and that trigger similarity (JSI) and payload choice crucially influence stealth. The work emphasizes the need for diverse benchmarks and defense strategies to counter AI-assisted HT attacks and provides a framework for iterating more robust HT detectors.

Abstract

The Hardware Trojan (HT) problem can be thought of as a continuous game between attackers and defenders, each striving to outsmart the other by leveraging any available means for an advantage. Machine Learning (ML) has recently played a key role in advancing HT research. Various novel techniques, such as Reinforcement Learning (RL) and Graph Neural Networks (GNNs), have shown HT insertion and detection capabilities. HT insertion with ML techniques, specifically, has seen a spike in research activity due to the shortcomings of conventional HT benchmarks and the inherent human design bias that occurs when we create them. This work continues this innovation by presenting a tool called TrojanForge, capable of generating HT adversarial examples that defeat HT detectors; demonstrating the capabilities of GAN-like adversarial tools for automatic HT insertion. We introduce an RL environment where the RL insertion agent interacts with HT detectors in an insertion-detection loop where the agent collects rewards based on its success in bypassing HT detectors. Our results show that this process helps inserted HTs evade various HT detectors, achieving high attack success percentages. This tool provides insight into why HT insertion fails in some instances and how we can leverage this knowledge in defense.

TrojanForge: Generating Adversarial Hardware Trojan Examples Using Reinforcement Learning

TL;DR

TrojanForge tackles the HT detection arms race by embedding an RL-based HT generator into a GAN-like loop with detectors, enabling adversarial HT insertion that evades multiple detection strategies. It introduces rare-net pruning (functional and structural) and a Jaccard-based diversity mechanism to curate trigger candidates, and employs a PPO-trained RL agent with a detector-informed reward to maximize stealth. Experimental results on ISCAS-85 show that, for several circuits, TrojanForge achieves substantial attack success percentages, and that trigger similarity (JSI) and payload choice crucially influence stealth. The work emphasizes the need for diverse benchmarks and defense strategies to counter AI-assisted HT attacks and provides a framework for iterating more robust HT detectors.

Abstract

The Hardware Trojan (HT) problem can be thought of as a continuous game between attackers and defenders, each striving to outsmart the other by leveraging any available means for an advantage. Machine Learning (ML) has recently played a key role in advancing HT research. Various novel techniques, such as Reinforcement Learning (RL) and Graph Neural Networks (GNNs), have shown HT insertion and detection capabilities. HT insertion with ML techniques, specifically, has seen a spike in research activity due to the shortcomings of conventional HT benchmarks and the inherent human design bias that occurs when we create them. This work continues this innovation by presenting a tool called TrojanForge, capable of generating HT adversarial examples that defeat HT detectors; demonstrating the capabilities of GAN-like adversarial tools for automatic HT insertion. We introduce an RL environment where the RL insertion agent interacts with HT detectors in an insertion-detection loop where the agent collects rewards based on its success in bypassing HT detectors. Our results show that this process helps inserted HTs evade various HT detectors, achieving high attack success percentages. This tool provides insight into why HT insertion fails in some instances and how we can leverage this knowledge in defense.
Paper Structure (11 sections, 3 equations, 6 figures, 3 tables)

This paper contains 11 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Insertion Flow of TrojanForge.
  • Figure 2: Applying Functional Pruning two rare paths. The final candidate net $K$ is selected to represent the rare nets.
  • Figure 3: $JSI$ of set $T$ for each ISCAS-85 circuit.
  • Figure 4: Average Episode Reward per Step of the RL agent when inserting against D1, D2, D3, Deterrent, and Random detectors for $c7552$.
  • Figure 5: TrojanForge attack success percentages for ISCAS-85 circuits
  • ...and 1 more figures