Table of Contents
Fetching ...

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

TL;DR

This work tackles on-device personalized code-switching ASR by separating large model weights from user-specific fine-tuning weights via a weight separation scheme, enabling lightweight PEFT deployment on low-spec devices. It introduces GLoRA, a gated LoRA variant, which integrates GLU-based gating into the low-rank adaptation to improve code-switching performance with minimal parameter growth. The approach leverages Whisper-tiny/small and Wav2Vec2-large(KO) backbones, extending tokenizers to cover unseen code-switch tokens and using log-mel features with CTC loss for training on the KECS Korean-English dataset. Results show that weight separation with LoRA significantly reduces on-device parameters while delivering competitive code-switching ASR performance, and GLoRA Type1 often yields the best gains, with token normalization providing additional WER improvements. Overall, the combination of weight separation and GLoRA offers an efficient, privacy-conscious path to personalized, on-device code-switching ASR for real-world deployment.

Abstract

In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch. Furthermore, GLoRA enhances parameter-efficient fine-tuning performance compared to conventional LoRA.

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

TL;DR

This work tackles on-device personalized code-switching ASR by separating large model weights from user-specific fine-tuning weights via a weight separation scheme, enabling lightweight PEFT deployment on low-spec devices. It introduces GLoRA, a gated LoRA variant, which integrates GLU-based gating into the low-rank adaptation to improve code-switching performance with minimal parameter growth. The approach leverages Whisper-tiny/small and Wav2Vec2-large(KO) backbones, extending tokenizers to cover unseen code-switch tokens and using log-mel features with CTC loss for training on the KECS Korean-English dataset. Results show that weight separation with LoRA significantly reduces on-device parameters while delivering competitive code-switching ASR performance, and GLoRA Type1 often yields the best gains, with token normalization providing additional WER improvements. Overall, the combination of weight separation and GLoRA offers an efficient, privacy-conscious path to personalized, on-device code-switching ASR for real-world deployment.

Abstract

In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter-efficient fine-tuning methods. Moreover, some people speak multiple languages in an utterance, as known as code-switching, the personalized ASR model is necessary to address such cases. However, current multilingual speech recognition models are limited to recognizing a single language within each utterance. To tackle this problem, we propose code-switching speech recognition models that incorporate fine-tuned monolingual and multilingual speech recognition models. Additionally, we introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch. Furthermore, GLoRA enhances parameter-efficient fine-tuning performance compared to conventional LoRA.
Paper Structure (13 sections, 3 equations, 4 figures, 3 tables)

This paper contains 13 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Schema of the separating personalized weights and large model weights.
  • Figure 2: Difference of the monolingual, multilingual, and code-switching ASR models.
  • Figure 3: Illustration of AuLoRA structures. Type 1: cross-attention. Type 2: self-attention. Type 3:GLU for hidden features. Type 4:GLU for low-rank features.
  • Figure 4: Illustration of the Korean linguistic features. A Korean word can be separated into word, character, and Jamo.