NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

Zeng Wang; Minghao Shao; Akashdeep Saha; Ramesh Karri; Johann Knechtel; Muhammad Shafique; Ozgur Sinanoglu

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

Zeng Wang, Minghao Shao, Akashdeep Saha, Ramesh Karri, Johann Knechtel, Muhammad Shafique, Ozgur Sinanoglu

TL;DR

NetDeTox tackles the vulnerability of hardware-security GNNs to adversarial netlist rewrites by fusing RL-guided gate pooling with LLM-driven, context-aware subnetlist planning. The framework localizes transformations to high-impact gate pools, balancing evasion effectiveness with area overhead through an iterative feedback loop that considers detector performance. Across six LLM backends and multiple GNN tools, NetDeTox achieves substantial evasion gains while often reducing silicon area, outperforming prior approaches like AttackGNN and LLMPirate. Ablation studies confirm that the joint RL–LLM strategy yields faster convergence and lower costs than either component alone, highlighting the practicality and scalability of the approach for real-world hardware-security evasion and potential extension to broader security tasks.

Abstract

Graph neural networks (GNNs) have shown promise in hardware security by learning structural motifs from netlist graphs. However, this reliance on motifs makes GNNs vulnerable to adversarial netlist rewrites; even small-scale edits can mislead GNN predictions. Existing adversarial approaches, ranging from synthesis-recipe perturbations to gate transformations, come with high design overheads. We present NetDeTox, an automated end-to-end framework that orchestrates large language models (LLMs) with reinforcement learning (RL) in a systematic manner, enabling focused local rewriting. The RL agent identifies netlist components critical for GNN-based reasoning, while the LLM devises rewriting plans to diversify motifs that preserve functionality. Iterative feedback between the RL and LLM stages refines adversarial rewritings to limit overheads. Compared to the SOTA work AttackGNN, NetDeTox successfully degrades the effectiveness of all security schemes with fewer rewrites and substantially lower area overheads (reductions of 54.50% for GNN-RE, 25.44% for GNN4IP, and 41.04% for OMLA, respectively). For GNN4IP, ours can even optimize/reduce the original benchmarks' area, in particular for larger circuits, demonstrating the practicality and scalability of NetDeTox.

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

TL;DR

Abstract

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)