Trojan Playground: A Reinforcement Learning Framework for Hardware Trojan Insertion and Detection
Amin Sarihi, Ahmad Patooghy, Peter Jamieson, Abdel-Hameed A. Badawy
TL;DR
This work addresses the bias and limited scope of traditional hardware Trojan benchmarks by introducing an automated reinforcement learning framework that jointly performs HT insertion and HT detection. The insertion component uses rare nets extraction and a PPO-trained RL agent to place HTs within ISCAS-85 circuits, while the detection component employs three tunable RL reward schemes (SSD,SAD,COD) to generate test vectors that reveal HTs. A generic HT-detection metric, the confidence value Conf.Val, enables fair comparison across detectors with user-defined risk preferences via the parameter α. Experimental results show robust detection performance (average ~90.54% across benchmarks) and demonstrate the value of a multi-criteria detector over single-metric approaches, while providing openly available data and tools to the research community.
Abstract
Current Hardware Trojan (HT) detection techniques are mostly developed based on a limited set of HT benchmarks. Existing HT benchmark circuits are generated with multiple shortcomings, i.e., i) they are heavily biased by the designers' mindset when created, and ii) they are created through a one-dimensional lens, mainly the signal activity of nets. We introduce the first automated Reinforcement Learning (RL) HT insertion and detection framework to address these shortcomings. In the HT insertion phase, an RL agent explores the circuits and finds locations best for keeping inserted HTs hidden. On the defense side, we introduce a multi-criteria RL-based HT detector that generates test vectors to discover the existence of HTs. Using the proposed framework, one can explore the HT insertion and detection design spaces to break the limitations of human mindset and benchmark issues, ultimately leading toward the next generation of innovative detectors. We demonstrate the efficacy of our framework on ISCAS-85 benchmarks, provide the attack and detection success rates, and define a methodology for comparing our techniques.
