ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

Swapnil Parekh

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

Swapnil Parekh

TL;DR

ACES, a representation-centric audit that extracts accent-discriminative subspaces and uses them to probe model fragility and disparity, suggests that accent-relevant features are deeply entangled with recognition-critical cues, positioning accent subspaces as vital diagnostic tools rather than simple"erasure"levers for fairness.

Abstract

ASR systems exhibit persistent performance disparities across accents, yet the internal mechanisms underlying these gaps remain poorly understood. We introduce ACES, a representation-centric audit that extracts accent-discriminative subspaces and uses them to probe model fragility and disparity. Analyzing Wav2Vec2-base with five English accents, we find that accent information concentrates in a low-dimensional early-layer subspace (layer 3, k=8). Projection magnitude correlates with per-utterance WER (r=0.26), and crucially, subspace-constrained perturbations yield stronger coupling between representation shift and degradation (r=0.32) than random-subspace controls (r=0.15). Finally, linear attenuation of this subspace however does not reduce disparity and slightly worsens it. Our findings suggest that accent-relevant features are deeply entangled with recognition-critical cues, positioning accent subspaces as vital diagnostic tools rather than simple "erasure" levers for fairness.

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

TL;DR

Abstract

Paper Structure (22 sections, 2 equations, 3 figures, 2 tables)

This paper contains 22 sections, 2 equations, 3 figures, 2 tables.

Introduction
Related Work
Accent and performance disparity in ASR
Training-based mitigation and adaptation
Adversarial ASR and robustness
Representation intervention and interpretability
Mechanistic interpretability
Method
Subspace extraction and validation
Subspace-constrained attacks
Project-out intervention
Experiments
Setup
Experiment details
Results
...and 7 more sections

Figures (3)

Figure 1: ACES: subspace extraction, stress-test (waveform PGD, L2 $\varepsilon$), and project-out. Conceptually, the representation $\mathbf{h}$ is projected onto the plane spanned by $\mathbf{U}$; coupling $m(x)$ measures how much the attack moves $\mathbf{h}$ along that subspace.
Figure 2: Three-track diagnostic (1-column): probe accuracy, corr(projection, WER), stability (principal angle, °) vs. layer. Layer 3 maximizes probe accuracy while maintaining stability below 50°; $k{=}8$ (dashed).
Figure 3: Coupling $m(x)$ vs. $\Delta\text{WER}$ (attacked $-$ clean) at layer $\ell^*$ for accent-subspace ($r{=}0.32$) and random-subspace ($r{=}0.15$). Distinct markers and trendlines show the difference in slopes.

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

TL;DR

Abstract

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (3)