Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko
TL;DR
This work tackles on-device personalized code-switching ASR by separating large model weights from user-specific fine-tuning weights via a weight separation scheme, enabling lightweight PEFT deployment on low-spec devices. It introduces GLoRA, a gated LoRA variant, which integrates GLU-based gating into the low-rank adaptation to improve code-switching performance with minimal parameter growth. The approach leverages Whisper-tiny/small and Wav2Vec2-large(KO) backbones, extending tokenizers to cover unseen code-switch tokens and using log-mel features with CTC loss for training on the KECS Korean-English dataset. Results show that weight separation with LoRA significantly reduces on-device parameters while delivering competitive code-switching ASR performance, and GLoRA Type1 often yields the best gains, with token normalization providing additional WER improvements. Overall, the combination of weight separation and GLoRA offers an efficient, privacy-conscious path to personalized, on-device code-switching ASR for real-world deployment.
Abstract
In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch. Furthermore, GLoRA enhances parameter-efficient fine-tuning performance compared to conventional LoRA.
