AttackGNN: Red-Teaming GNNs in Hardware Security Using Reinforcement Learning
Vasudev Gohil, Satwik Patnaik, Dileep Kalathil, Jeyavijayan Rajendran
TL;DR
AttackGNN tackles the problem of evaluating GNN-based hardware-security techniques under adversarial manipulation. It formulates the adversarial circuit perturbation problem as a reinforcement-learning task that preserves circuit functionality, using a novel set of synthesis-tool-agnostic actions and sparse rewards. A contextual MDP enables a single RL agent to attack multiple GNNs across IP piracy, HT detection/localization, reverse engineering, and obfuscation, achieving successful misclassification in all tested cases and across real-world benchmarks. The work underscores the need for robust defenses of ML-based hardware security tools and demonstrates practical implications, such as false positives in IP piracy detectors and potential leakage of secret keys via HTs, while offering a scalable framework for red-teaming such systems.
Abstract
Machine learning has shown great promise in addressing several critical hardware security problems. In particular, researchers have developed novel graph neural network (GNN)-based techniques for detecting intellectual property (IP) piracy, detecting hardware Trojans (HTs), and reverse engineering circuits, to name a few. These techniques have demonstrated outstanding accuracy and have received much attention in the community. However, since these techniques are used for security applications, it is imperative to evaluate them thoroughly and ensure they are robust and do not compromise the security of integrated circuits. In this work, we propose AttackGNN, the first red-team attack on GNN-based techniques in hardware security. To this end, we devise a novel reinforcement learning (RL) agent that generates adversarial examples, i.e., circuits, against the GNN-based techniques. We overcome three challenges related to effectiveness, scalability, and generality to devise a potent RL agent. We target five GNN-based techniques for four crucial classes of problems in hardware security: IP piracy, detecting/localizing HTs, reverse engineering, and hardware obfuscation. Through our approach, we craft circuits that fool all GNNs considered in this work. For instance, to evade IP piracy detection, we generate adversarial pirated circuits that fool the GNN-based defense into classifying our crafted circuits as not pirated. For attacking HT localization GNN, our attack generates HT-infested circuits that fool the defense on all tested circuits. We obtain a similar 100% success rate against GNNs for all classes of problems.
