THOR: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX
Farshad Dizani, Azam Ghanbari, Joshua Kalyanapu, Darsh Asher, Samira Mirbagher Ajorpaz
TL;DR
The paper identifies a non-speculative, value-dependent timing side-channel in Intel AMX that leaks neural network weight sparsity without requiring access to outputs or privileged state. It develops Thor, a PoC attack that reverse-engineers the Tile Matrix Multiply unit and leverages a threshold-based timing analysis with vector scoring to infer zero vs non-zero weights for $64$ input elements, achieving this within $50$ minutes and at a leakage rate of $R_{Thor}=76.8$ bits/hour. Thor outperforms prior microarchitectural side-channel attacks (e.g., Hertzbleed, Collide+Power) and remains effective even when DVFS is disabled or SGX is used, indicating a substantial ML privacy risk in accelerator-rich environments. The authors propose a micro-code–level defense to keep AMX in a Warm state, albeit with measurable power and performance penalties, underscoring the need for robust mitigations as AI workloads increasingly rely on on-chip accelerators.
Abstract
The rise of on-chip accelerators signifies a major shift in computing, driven by the growing demands of artificial intelligence (AI) and specialized applications. These accelerators have gained popularity due to their ability to substantially boost performance, cut energy usage, lower total cost of ownership (TCO), and promote sustainability. Intel's Advanced Matrix Extensions (AMX) is one such on-chip accelerator, specifically designed for handling tasks involving large matrix multiplications commonly used in machine learning (ML) models, image processing, and other computational-heavy operations. In this paper, we introduce a novel value-dependent timing side-channel vulnerability in Intel AMX. By exploiting this weakness, we demonstrate a software-based, value-dependent timing side-channel attack capable of inferring the sparsity of neural network weights without requiring any knowledge of the confidence score, privileged access or physical proximity. Our attack method can fully recover the sparsity of weights assigned to 64 input elements within 50 minutes, which is 631% faster than the maximum leakage rate achieved in the Hertzbleed attack.
