TrojanLoC: LLM-based Framework for RTL Trojan Localization
Weihua Xiao, Zeng Wang, Minghao Shao, Raghu Vamshi Hemadri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel, Siddharth Garg, Ramesh Karri
TL;DR
This work targets hardware Trojans inserted at the RTL level and addresses the shortcomings of graph-based embedding approaches, which lose RTL semantics and struggle with fine-grained localization. It introduces TrojanLoC, an RTL-centric framework that uses RTL-finetuned decoder-only LLMs to produce module- and line-level embeddings, followed by autoencoders and task-specific classifiers for module-level detection, Trojan-type prediction, and line-level localization. The TrojanInS dataset provides 16k+ Trojaned RTL designs with precise line-level annotations across four Trojan families, enabling robust training and evaluation. Experiments show TrojanLoC achieves near-perfect module-level Trojan detection (F1 ≈ 0.99) and strong line-level localization (macro-F1 ≈ 0.92), significantly outperforming GNN-based baselines and prompting LLM methods, while also surfacing suspicious lines within the top few percent for efficient manual review.
Abstract
Hardware Trojans (HT s) are a persistent threat to integrated circuits, especially when inserted at the register-transfer level (RTL). Existing methods typically first convert the design into a graph, such as a gate-level netlist or an RTL-derived dataflow graph (DFG), and then use a graph neural network (GNN ) to obtain an embedding of that graph, which (i) loses compact RTL semantics, (ii) relies on shallow GNNs with limited receptive field, and (iii) is largely restricted to coarse, module-level binary HT detection. We propose TrojanLoC, an LLM-based framework for RTL-level HT localization. We use an RTL-finetuned LLM to derive module-level and line-level embeddings directly from RTL code, capturing both global design context and local semantics. Next, we train task-specific classifiers on these embeddings to perform module-level Trojan detection, type prediction, and fine-grained line-level localization. We also introduce TrojanInS, a large synthetic dataset of RTL designs with systematically injected Trojans from four effect-based categories, each accompanied by precise line-level annotations. Our experiments show that TrojanLoC achieves strong module-level performance, reaching 0.99 F1-score for Trojan detection, up to 0.68 higher than baseline, and 0.84 macro-F1 for Trojan-type classification. At the line level, TrojanLoc further achieves up to 0.93 macro-F1, enabling fine-grained localization of Trojan-relevant RTL lines
