Table of Contents
Fetching ...

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

Swapnil Parekh

TL;DR

ACES, a representation-centric audit that extracts accent-discriminative subspaces and uses them to probe model fragility and disparity, suggests that accent-relevant features are deeply entangled with recognition-critical cues, positioning accent subspaces as vital diagnostic tools rather than simple"erasure"levers for fairness.

Abstract

ASR systems exhibit persistent performance disparities across accents, yet the internal mechanisms underlying these gaps remain poorly understood. We introduce ACES, a representation-centric audit that extracts accent-discriminative subspaces and uses them to probe model fragility and disparity. Analyzing Wav2Vec2-base with five English accents, we find that accent information concentrates in a low-dimensional early-layer subspace (layer 3, k=8). Projection magnitude correlates with per-utterance WER (r=0.26), and crucially, subspace-constrained perturbations yield stronger coupling between representation shift and degradation (r=0.32) than random-subspace controls (r=0.15). Finally, linear attenuation of this subspace however does not reduce disparity and slightly worsens it. Our findings suggest that accent-relevant features are deeply entangled with recognition-critical cues, positioning accent subspaces as vital diagnostic tools rather than simple "erasure" levers for fairness.

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

TL;DR

ACES, a representation-centric audit that extracts accent-discriminative subspaces and uses them to probe model fragility and disparity, suggests that accent-relevant features are deeply entangled with recognition-critical cues, positioning accent subspaces as vital diagnostic tools rather than simple"erasure"levers for fairness.

Abstract

ASR systems exhibit persistent performance disparities across accents, yet the internal mechanisms underlying these gaps remain poorly understood. We introduce ACES, a representation-centric audit that extracts accent-discriminative subspaces and uses them to probe model fragility and disparity. Analyzing Wav2Vec2-base with five English accents, we find that accent information concentrates in a low-dimensional early-layer subspace (layer 3, k=8). Projection magnitude correlates with per-utterance WER (r=0.26), and crucially, subspace-constrained perturbations yield stronger coupling between representation shift and degradation (r=0.32) than random-subspace controls (r=0.15). Finally, linear attenuation of this subspace however does not reduce disparity and slightly worsens it. Our findings suggest that accent-relevant features are deeply entangled with recognition-critical cues, positioning accent subspaces as vital diagnostic tools rather than simple "erasure" levers for fairness.
Paper Structure (22 sections, 2 equations, 3 figures, 2 tables)

This paper contains 22 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: ACES: subspace extraction, stress-test (waveform PGD, L2 $\varepsilon$), and project-out. Conceptually, the representation $\mathbf{h}$ is projected onto the plane spanned by $\mathbf{U}$; coupling $m(x)$ measures how much the attack moves $\mathbf{h}$ along that subspace.
  • Figure 2: Three-track diagnostic (1-column): probe accuracy, corr(projection, WER), stability (principal angle, °) vs. layer. Layer 3 maximizes probe accuracy while maintaining stability below 50°; $k{=}8$ (dashed).
  • Figure 3: Coupling $m(x)$ vs. $\Delta\text{WER}$ (attacked $-$ clean) at layer $\ell^*$ for accent-subspace ($r{=}0.32$) and random-subspace ($r{=}0.15$). Distinct markers and trendlines show the difference in slopes.