Table of Contents
Fetching ...

GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR

Pouya Mehralian, Melissa Farasyn, Anne Breitbarth, Anne-Sophie Ghyselen, Hugo Van hamme

TL;DR

Results show metadata-gated low-rank adaptation is an effective, interpretable, and efficient solution for dialectal ASR.

Abstract

Automatic Speech Recognition (ASR) in dialect-heavy settings remains challenging due to strong regional variation and limited labeled data. We propose GLoRIA, a parameter-efficient adaptation framework that leverages metadata (e.g., coordinates) to modulate low-rank updates in a pre-trained encoder. GLoRIA injects low-rank matrices into each feed-forward layer, with a gating MLP determining the non-negative contribution of each LoRA rank-1 component based on location metadata. On the GCND corpus, GLoRIA outperforms geo-conditioned full fine-tuning, LoRA, and both dialect-specific and unified full fine-tuning, achieving state-of-the-art word error rates while updating under 10% of parameters. GLoRIA also generalizes well to unseen dialects, including in extrapolation scenarios, and enables interpretable adaptation patterns that can be visualized geospatially. These results show metadata-gated low-rank adaptation is an effective, interpretable, and efficient solution for dialectal ASR.

GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR

TL;DR

Results show metadata-gated low-rank adaptation is an effective, interpretable, and efficient solution for dialectal ASR.

Abstract

Automatic Speech Recognition (ASR) in dialect-heavy settings remains challenging due to strong regional variation and limited labeled data. We propose GLoRIA, a parameter-efficient adaptation framework that leverages metadata (e.g., coordinates) to modulate low-rank updates in a pre-trained encoder. GLoRIA injects low-rank matrices into each feed-forward layer, with a gating MLP determining the non-negative contribution of each LoRA rank-1 component based on location metadata. On the GCND corpus, GLoRIA outperforms geo-conditioned full fine-tuning, LoRA, and both dialect-specific and unified full fine-tuning, achieving state-of-the-art word error rates while updating under 10% of parameters. GLoRIA also generalizes well to unseen dialects, including in extrapolation scenarios, and enables interpretable adaptation patterns that can be visualized geospatially. These results show metadata-gated low-rank adaptation is an effective, interpretable, and efficient solution for dialectal ASR.
Paper Structure (28 sections, 4 equations, 2 figures, 2 tables)

This paper contains 28 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Geographical distribution of activation for four NMF-derived adaptation components. Each point represents a location; color intensity reflects the degree of adaptation component usage at that location. The emergent spatial patterns closely correspond to known dialect regions. The patterns shown correspond to the Frans-Vlaams, Limburgs, Oost-Vlaams and Antwerp regions, from left to right.
  • Figure 2: Clustered heatmap of NMF component activations averaged by dialect region. Regions that are geographically and acoustically closer cluster together, indicating interpretable adaptation behavior.