HeatSense: Intelligent Thermal Anomaly Detection for Securing NoC-Enabled MPSoCs
Mahdi Hasanzadeh, Kasem Khalil, Cynthia Sturton, Ahmad Patooghy
TL;DR
This paper tackles the security risk of thermal-based hardware Trojans manipulating dynamic thermal management in NoC-enabled MPSoCs by proposing a lightweight, real-time anomaly-detection module embedded in NoC routers. The approach combines feature-driven selection, hardware-friendly thresholding with multi-tier sigma bounds, and a weighted moving average framework to detect and mitigate malicious temperature fluctuations with minimal hardware overhead. Experimental results from a two-stage CoMeT–AccessNoxim simulation show that high-signal feature sets achieve strong detection performance (up to ~82% accuracy) while reducing logic and specialized resources by up to 75% and 100%, respectively, compared with ML models like Random Forest. The work demonstrates a practical, low-cost security mechanism that preserves performance in resource-constrained NoC environments, enabling resilient thermal management in MPSoCs.
Abstract
Multi-Processor System-on-Chips (MPSoCs) are highly vulnerable to thermal attacks that manipulate dynamic thermal management systems. To counter this, we propose an adaptive real-time monitoring mechanism that detects abnormal thermal patterns in chip tiles. Our design space exploration helped identify key thermal features for an efficient anomaly detection module to be implemented at routers of network-enabled MPSoCs. To minimize hardware overhead, we employ weighted moving average (WMA) calculations and bit-shift operations, ensuring a lightweight yet effective implementation. By defining a spectrum of abnormal behaviors, our system successfully detects and mitigates malicious temperature fluctuations, reducing severe cases from 3.00°C to 1.9°C. The anomaly detection module achieves up to 82% of accuracy in detecting thermal attacks, which is only 10-15% less than top-performing machine learning (ML) models like Random Forest. However, our approach reduces hardware usage by up to 75% for logic resources and 100% for specialized resources, making it significantly more efficient than ML-based solutions. This method provides a practical, low-cost solution for resource-constrained environments, ensuring resilience against thermal attacks while maintaining system performance.
