Table of Contents
Fetching ...

Interpreting Emergent Features in Deep Learning-based Side-channel Analysis

Sengim Karayalçin, Marina Krček, Stjepan Picek

TL;DR

This work tackles the interpretability gap in deep learning-based side-channel analysis (DLSCA) by applying mechanistic interpretability (MI) to reveal how neural networks exploit leakage. It constructs a pipeline using logit and activation analyses, PCA, and activation patching to connect model behavior to physical leakage and recover secret shares $s_i$, even without access to masking randomness. Across CHES_CTF, ESHARD, and ASCAD datasets, the authors observe discrete phase-transition structures in the learned representations, provide evidence of weak universality in the underlying leakage circuits, and demonstrate partial to full mask recovery in realistic settings. The study offers a practical path for security evaluators to move from black-box to white-box assessments, aiding countermeasure design, with open-source code to enable reproducibility.

Abstract

Side-channel analysis (SCA) poses a real-world threat by exploiting unintentional physical signals to extract secret information from secure devices. Evaluation labs also use the same techniques to certify device security. In recent years, deep learning has emerged as a prominent method for SCA, achieving state-of-the-art attack performance at the cost of interpretability. Understanding how neural networks extract secrets is crucial for security evaluators aiming to defend against such attacks, as only by understanding the attack can one propose better countermeasures. In this work, we apply mechanistic interpretability to neural networks trained for SCA, revealing \textit{how} models exploit \textit{what} leakage in side-channel traces. We focus on sudden jumps in performance to reverse engineer learned representations, ultimately recovering secret masks and moving the evaluation process from black-box to white-box. Our results show that mechanistic interpretability can scale to realistic SCA settings, even when relevant inputs are sparse, model accuracies are low, and side-channel protections prevent standard input interventions.

Interpreting Emergent Features in Deep Learning-based Side-channel Analysis

TL;DR

This work tackles the interpretability gap in deep learning-based side-channel analysis (DLSCA) by applying mechanistic interpretability (MI) to reveal how neural networks exploit leakage. It constructs a pipeline using logit and activation analyses, PCA, and activation patching to connect model behavior to physical leakage and recover secret shares , even without access to masking randomness. Across CHES_CTF, ESHARD, and ASCAD datasets, the authors observe discrete phase-transition structures in the learned representations, provide evidence of weak universality in the underlying leakage circuits, and demonstrate partial to full mask recovery in realistic settings. The study offers a practical path for security evaluators to move from black-box to white-box assessments, aiding countermeasure design, with open-source code to enable reproducibility.

Abstract

Side-channel analysis (SCA) poses a real-world threat by exploiting unintentional physical signals to extract secret information from secure devices. Evaluation labs also use the same techniques to certify device security. In recent years, deep learning has emerged as a prominent method for SCA, achieving state-of-the-art attack performance at the cost of interpretability. Understanding how neural networks extract secrets is crucial for security evaluators aiming to defend against such attacks, as only by understanding the attack can one propose better countermeasures. In this work, we apply mechanistic interpretability to neural networks trained for SCA, revealing \textit{how} models exploit \textit{what} leakage in side-channel traces. We focus on sudden jumps in performance to reverse engineer learned representations, ultimately recovering secret masks and moving the evaluation process from black-box to white-box. Our results show that mechanistic interpretability can scale to realistic SCA settings, even when relevant inputs are sparse, model accuracies are low, and side-channel protections prevent standard input interventions.

Paper Structure

This paper contains 22 sections, 18 figures, 1 table.

Figures (18)

  • Figure 1: The analysis approach used in this study broadly consists of three major steps. After the performance increases are located using the PI metric, we plot logits to extract relevant features. Using these features, we plot the PCs of the activations and find the structure related to the leakage. Finally, we apply activation patching to reverse-engineer the masks.
  • Figure 2: Example of a single trace captured during the execution of the Advanced Encryption Standard (AES) cipher rijmen2001advanced.
  • Figure 3: Logit analysis (first column) and activation analysis (remaining columns) from models at epoch 50 (top) and epoch 100 (bottom) for CHES_CTF. Legends for activation analysis are shared within columns. The difference in the number of points between the last two columns is due to not plotting the points for classes (HWs) 3, 4, and 5.
  • Figure 4: Evolution of Perceived Information for training and test traces of the CHES_CTF dataset.
  • Figure 5: SNR plot and PC distributions for mask values using patching experiments for CHES_CTF. We set PC0 to -20 for both patching experiments, as that resulted in more apparent separation during manual testing.
  • ...and 13 more figures