Table of Contents
Fetching ...

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Yujian Ma, Xikun Lu, Jinqiu Sang, Xianquan Jiang, Ruizhe Li

TL;DR

This work tackles the challenge of resource-efficient adaptation of large speech encoders by performing a mechanistic interpretability study of LoRA applied to the Whisper encoder for Speech Emotion Recognition (SER). Through layer contribution probing, logit-lens, and representational analyses using SVD and CK A, it uncovers two core mechanisms: a delayed specialization strategy that preserves general representations in early layers and a forward alignment with a backward-differentiation dynamic between LoRA components. These findings clarify how LoRA reshapes encoder hierarchies to achieve robust SER performance, offering principled guidance for designing efficient fine-tuning strategies in large speech models. The study provides reproducible experiments on IEMOCAP with strong performance gains (e.g., Whisper-large-v2 with LoRA achieving UAR $0.774\pm0.026$ and WAR $0.768\pm0.035$) and shares code publicly, underscoring its practical relevance for deployable, interpretable speech systems.

Abstract

Large pre-trained speech models such as Whisper offer strong generalization but pose significant challenges for resource-efficient adaptation. Low-Rank Adaptation (LoRA) has become a popular parameter-efficient fine-tuning method, yet its underlying mechanisms in speech tasks remain poorly understood. In this work, we conduct the first systematic mechanistic interpretability study of LoRA within the Whisper encoder for speech emotion recognition (SER). Using a suite of analytical tools, including layer contribution probing, logit-lens inspection, and representational similarity via singular value decomposition (SVD) and centered kernel alignment (CKA), we reveal two key mechanisms: a delayed specialization process that preserves general features in early layers before consolidating task-specific information, and a forward alignment, backward differentiation dynamic between LoRA's matrices. Our findings clarify how LoRA reshapes encoder hierarchies, providing both empirical insights and a deeper mechanistic understanding for designing efficient and interpretable adaptation strategies in large speech models. Our code is available at https://github.com/harryporry77/Behind-the-Scenes.

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

TL;DR

This work tackles the challenge of resource-efficient adaptation of large speech encoders by performing a mechanistic interpretability study of LoRA applied to the Whisper encoder for Speech Emotion Recognition (SER). Through layer contribution probing, logit-lens, and representational analyses using SVD and CK A, it uncovers two core mechanisms: a delayed specialization strategy that preserves general representations in early layers and a forward alignment with a backward-differentiation dynamic between LoRA components. These findings clarify how LoRA reshapes encoder hierarchies to achieve robust SER performance, offering principled guidance for designing efficient fine-tuning strategies in large speech models. The study provides reproducible experiments on IEMOCAP with strong performance gains (e.g., Whisper-large-v2 with LoRA achieving UAR and WAR ) and shares code publicly, underscoring its practical relevance for deployable, interpretable speech systems.

Abstract

Large pre-trained speech models such as Whisper offer strong generalization but pose significant challenges for resource-efficient adaptation. Low-Rank Adaptation (LoRA) has become a popular parameter-efficient fine-tuning method, yet its underlying mechanisms in speech tasks remain poorly understood. In this work, we conduct the first systematic mechanistic interpretability study of LoRA within the Whisper encoder for speech emotion recognition (SER). Using a suite of analytical tools, including layer contribution probing, logit-lens inspection, and representational similarity via singular value decomposition (SVD) and centered kernel alignment (CKA), we reveal two key mechanisms: a delayed specialization process that preserves general features in early layers before consolidating task-specific information, and a forward alignment, backward differentiation dynamic between LoRA's matrices. Our findings clarify how LoRA reshapes encoder hierarchies, providing both empirical insights and a deeper mechanistic understanding for designing efficient and interpretable adaptation strategies in large speech models. Our code is available at https://github.com/harryporry77/Behind-the-Scenes.

Paper Structure

This paper contains 15 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Layer-wise differences (LoRA minus frozen) on the encoder residual stream: (a) mean relative contribution and (b) cosine similarity for self-attention, MLP, and their sum.
  • Figure 2: Layer-wise analysis of LoRA's internal representations. (a) Logit-Lens analysis. (b) t-SNE visualization across different LoRA ranks.
  • Figure 3: Analysis of LoRA's internal dynamics. (a--d) SVD-based effective-rank curves for LoRA$_A$/LoRA$_B$ activations and their gradients, comparing trained LoRA (solid) with random initialization (dashed). The y-axis is cumulative energy ratio and the x-axis is the number of singular components. (e) CKA between LoRA$_A$ and LoRA$_B$ for activations and gradients.