Table of Contents
Fetching ...

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

Zeng Wang, Minghao Shao, Akashdeep Saha, Ramesh Karri, Johann Knechtel, Muhammad Shafique, Ozgur Sinanoglu

TL;DR

NetDeTox tackles the vulnerability of hardware-security GNNs to adversarial netlist rewrites by fusing RL-guided gate pooling with LLM-driven, context-aware subnetlist planning. The framework localizes transformations to high-impact gate pools, balancing evasion effectiveness with area overhead through an iterative feedback loop that considers detector performance. Across six LLM backends and multiple GNN tools, NetDeTox achieves substantial evasion gains while often reducing silicon area, outperforming prior approaches like AttackGNN and LLMPirate. Ablation studies confirm that the joint RL–LLM strategy yields faster convergence and lower costs than either component alone, highlighting the practicality and scalability of the approach for real-world hardware-security evasion and potential extension to broader security tasks.

Abstract

Graph neural networks (GNNs) have shown promise in hardware security by learning structural motifs from netlist graphs. However, this reliance on motifs makes GNNs vulnerable to adversarial netlist rewrites; even small-scale edits can mislead GNN predictions. Existing adversarial approaches, ranging from synthesis-recipe perturbations to gate transformations, come with high design overheads. We present NetDeTox, an automated end-to-end framework that orchestrates large language models (LLMs) with reinforcement learning (RL) in a systematic manner, enabling focused local rewriting. The RL agent identifies netlist components critical for GNN-based reasoning, while the LLM devises rewriting plans to diversify motifs that preserve functionality. Iterative feedback between the RL and LLM stages refines adversarial rewritings to limit overheads. Compared to the SOTA work AttackGNN, NetDeTox successfully degrades the effectiveness of all security schemes with fewer rewrites and substantially lower area overheads (reductions of 54.50% for GNN-RE, 25.44% for GNN4IP, and 41.04% for OMLA, respectively). For GNN4IP, ours can even optimize/reduce the original benchmarks' area, in particular for larger circuits, demonstrating the practicality and scalability of NetDeTox.

NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration

TL;DR

NetDeTox tackles the vulnerability of hardware-security GNNs to adversarial netlist rewrites by fusing RL-guided gate pooling with LLM-driven, context-aware subnetlist planning. The framework localizes transformations to high-impact gate pools, balancing evasion effectiveness with area overhead through an iterative feedback loop that considers detector performance. Across six LLM backends and multiple GNN tools, NetDeTox achieves substantial evasion gains while often reducing silicon area, outperforming prior approaches like AttackGNN and LLMPirate. Ablation studies confirm that the joint RL–LLM strategy yields faster convergence and lower costs than either component alone, highlighting the practicality and scalability of the approach for real-world hardware-security evasion and potential extension to broader security tasks.

Abstract

Graph neural networks (GNNs) have shown promise in hardware security by learning structural motifs from netlist graphs. However, this reliance on motifs makes GNNs vulnerable to adversarial netlist rewrites; even small-scale edits can mislead GNN predictions. Existing adversarial approaches, ranging from synthesis-recipe perturbations to gate transformations, come with high design overheads. We present NetDeTox, an automated end-to-end framework that orchestrates large language models (LLMs) with reinforcement learning (RL) in a systematic manner, enabling focused local rewriting. The RL agent identifies netlist components critical for GNN-based reasoning, while the LLM devises rewriting plans to diversify motifs that preserve functionality. Iterative feedback between the RL and LLM stages refines adversarial rewritings to limit overheads. Compared to the SOTA work AttackGNN, NetDeTox successfully degrades the effectiveness of all security schemes with fewer rewrites and substantially lower area overheads (reductions of 54.50% for GNN-RE, 25.44% for GNN4IP, and 41.04% for OMLA, respectively). For GNN4IP, ours can even optimize/reduce the original benchmarks' area, in particular for larger circuits, demonstrating the practicality and scalability of NetDeTox.

Paper Structure

This paper contains 23 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Demonstration of NetDeTox for netlist rewriting. Direct rewriting of a GNN detected design A, without joint security–area awareness, produces a suboptimal design B. In contrast, NetDeTox employs context-aware iterative planning with coordinated gate selection, subnetlist mapping, and hop-size determination, all to achieve an efficient design C.
  • Figure 2: NetDeTox framework overview.
  • Figure 3: Security of NetDeTox netlists against OMLA. Original refers to the baseline design. Scores $\approx 50\%$ indicate successful evasion; the original's score $> 50\%$ shows high baseline vulnerability to OMLA.
  • Figure 4: Security of NetDeTox netlists against GNN4IP. Original refers to the baseline design. Scores $< 0$ indicate successful evasion; the original's score $\approx 1$ shows high baseline vulnerability to GNN4IP.
  • Figure 5: Security of NetDeTox netlists against GNN-RE. Original refers to the baseline design. Scores $\leq 25\%$ indicate successful evasion; the original's score $\approx 100\%$ shows high baseline vulnerability to GNN-RE.
  • ...and 4 more figures