Table of Contents
Fetching ...

Automated Hardware Trojan Insertion in Industrial-Scale Designs

Yaroslav Popryho, Debjit Pal, Inna Partin-Vaisband

TL;DR

Addresses the challenge of evaluating hardware Trojan detectors on industrial-scale SoCs by automating Trojan-like insertions that preserve I/O. Proposes a pipeline that builds connectivity graphs, mines rare regions with SCOAP metrics, and applies function-preserving graph rewrites guided by a learned policy and LLM-generated template library. Produces Trojanized netlists with per-net and per-cone labels for reproducible, large-scale detector evaluation under extreme class imbalance (Trojan-labeled nets typically $\ll 0.1\%$). Demonstrates that state-of-the-art graph-based detectors trained on small benchmarks struggle to detect unseen Trojans in big designs, highlighting a need for hierarchical, testability-aware models and broader template diversity. Provides a reproducible benchmark framework bridging the gap between academic benchmarks and production-scale SoCs.

Abstract

Industrial Systems-on-Chips (SoCs) often comprise hundreds of thousands to millions of nets and millions to tens of millions of connectivity edges, making empirical evaluation of hardware-Trojan (HT) detectors on realistic designs both necessary and difficult. Public benchmarks remain significantly smaller and hand-crafted, while releasing truly malicious RTL raises ethical and operational risks. This work presents an automated and scalable methodology for generating HT-like patterns in industry-scale netlists whose purpose is to stress-test detection tools without altering user-visible functionality. The pipeline (i) parses large gate-level designs into connectivity graphs, (ii) explores rare regions using SCOAP testability metrics, and (iii) applies parameterized, function-preserving graph transformations to synthesize trigger-payload pairs that mimic the statistical footprint of stealthy HTs. When evaluated on the benchmarks generated in this work, representative state-of-the-art graph-learning models fail to detect Trojans. The framework closes the evaluation gap between academic circuits and modern SoCs by providing reproducible challenge instances that advance security research without sharing step-by-step attack instructions.

Automated Hardware Trojan Insertion in Industrial-Scale Designs

TL;DR

Addresses the challenge of evaluating hardware Trojan detectors on industrial-scale SoCs by automating Trojan-like insertions that preserve I/O. Proposes a pipeline that builds connectivity graphs, mines rare regions with SCOAP metrics, and applies function-preserving graph rewrites guided by a learned policy and LLM-generated template library. Produces Trojanized netlists with per-net and per-cone labels for reproducible, large-scale detector evaluation under extreme class imbalance (Trojan-labeled nets typically ). Demonstrates that state-of-the-art graph-based detectors trained on small benchmarks struggle to detect unseen Trojans in big designs, highlighting a need for hierarchical, testability-aware models and broader template diversity. Provides a reproducible benchmark framework bridging the gap between academic benchmarks and production-scale SoCs.

Abstract

Industrial Systems-on-Chips (SoCs) often comprise hundreds of thousands to millions of nets and millions to tens of millions of connectivity edges, making empirical evaluation of hardware-Trojan (HT) detectors on realistic designs both necessary and difficult. Public benchmarks remain significantly smaller and hand-crafted, while releasing truly malicious RTL raises ethical and operational risks. This work presents an automated and scalable methodology for generating HT-like patterns in industry-scale netlists whose purpose is to stress-test detection tools without altering user-visible functionality. The pipeline (i) parses large gate-level designs into connectivity graphs, (ii) explores rare regions using SCOAP testability metrics, and (iii) applies parameterized, function-preserving graph transformations to synthesize trigger-payload pairs that mimic the statistical footprint of stealthy HTs. When evaluated on the benchmarks generated in this work, representative state-of-the-art graph-learning models fail to detect Trojans. The framework closes the evaluation gap between academic circuits and modern SoCs by providing reproducible challenge instances that advance security research without sharing step-by-step attack instructions.

Paper Structure

This paper contains 12 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Example of hardware Trojan trigger (Palindrome Data).
  • Figure 2: Example of hardware Trojan payload (Reset Disable).
  • Figure 3: End-to-end pipeline. RTL and constraints are synthesized with Synopsys DC to a gate-level netlist (GLN). The GLN is converted to a graph with attributes; SCOAP yields testability scores used to identify rare nodes and realistic COIs. A function-preserving template library (populated via LLM-assisted trigger & payload generation under human review) is combined with an ML-based policy to select placements and parameters. Inserted patterns must preserve functionality, DFT/scan checks, and STA/area budgets; failures trigger parameter readjustment. Successful instances are exported with labels/metadata/splits for reproducible detector evaluation. No new PIs/POs/scan behaviors are introduced at any stage.
  • Figure 4: An example of rarity-annotated subgraph. Connections indicate edges that reconverge toward a central high-$R$ node. Nodes are labeled by net name; color encodes rarity $R$ (blue $\rightarrow$ orange $\rightarrow$ red for low $\rightarrow$ medium $\rightarrow$ high), Example SCOAP values (calculated based on (\ref{['eq:rarity_formula']}) w/ $\alpha$ = 1) are shown for N680, and N401.
  • Figure 5: Nodes rarity heatmap for a SoC with many shallow logic COIs (the Ibex RISC-V CPU core). As expected, set of rare nodes shifts to the right because deeper nets (higher CO) usually raise $R$. Yet, extremely rare nets can be identified at modest depth (upper-left) when controllability is very poor.
  • ...and 1 more figures