SALTY: Explainable Artificial Intelligence Guided Structural Analysis for Hardware Trojan Detection
Tanzim Mahfuz, Pravin Gaikwad, Tasneem Suha, Swarup Bhunia, Prabuddha Chakraborty
TL;DR
SALTY tackles hardware Trojan detection in distributed semiconductor supply chains by combining a Jumping-Knowledge-enabled Graph Attention Network with an explainable AI post-processing module. It constructs a wire-graph representation, extracts local structural features, and uses JK-GAT to produce robust node embeddings that generalize to unseen designs. Explainability via Captum Integrated Gradients guides a dynamic post-processing step that reduces AI hallucinations, leading to high $TPR$ and $TNR$ (e.g., $TPR=98.47\%$, $TNR=98.14\%$) across >15 benchmarks and outperforming seven state-of-the-art methods. The approach also yields human-readable rules that illuminate the detection logic, enhancing trust and practical applicability in hardware security workflows.
Abstract
Hardware Trojans are malicious modifications in digital designs that can be inserted by untrusted supply chain entities. Hardware Trojans can give rise to diverse attack vectors such as information leakage (e.g. MOLES Trojan) and denial-of-service (rarely triggered bit flip). Such an attack in critical systems (e.g. healthcare and aviation) can endanger human lives and lead to catastrophic financial loss. Several techniques have been developed to detect such malicious modifications in digital designs, particularly for designs sourced from third-party intellectual property (IP) vendors. However, most techniques have scalability concerns (due to unsound assumptions during evaluation) and lead to large number of false positive detections (false alerts). Our framework (SALTY) mitigates these concerns through the use of a novel Graph Neural Network architecture (using Jumping-Knowledge mechanism) for generating initial predictions and an Explainable Artificial Intelligence (XAI) approach for fine tuning the outcomes (post-processing). Experiments show 98% True Positive Rate (TPR) and True Negative Rate (TNR), significantly outperforming state-of-the-art techniques across a large set of standard benchmarks.
