Table of Contents
Fetching ...

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko

TL;DR

This work addresses the practical challenge of deploying large language model rescoring for ASR by introducing LoRB, a low-rank adaptation (LoRA) framework that freezes the pretrained model and trains only tiny low-rank updates. By coupling a discriminative MWER objective with a correlation-based regularization, LoRB maintains or improves word error rate while dramatically reducing trainable parameters (to $0.08\%$) and achieving substantial training-time and memory savings. Evaluations on LibriSpeech and internal domains show LoRB matches or exceeds full fine-tuning and other parameter-efficient methods, with robustness to domain shifts and favorable scaling behavior as model size and data increase. The results imply that large pretrained LMs can be effectively specialized for rescoring in ASR at a fraction of the computational cost, enabling scalable, production-friendly second-pass rescoring.

Abstract

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

TL;DR

This work addresses the practical challenge of deploying large language model rescoring for ASR by introducing LoRB, a low-rank adaptation (LoRA) framework that freezes the pretrained model and trains only tiny low-rank updates. By coupling a discriminative MWER objective with a correlation-based regularization, LoRB maintains or improves word error rate while dramatically reducing trainable parameters (to ) and achieving substantial training-time and memory savings. Evaluations on LibriSpeech and internal domains show LoRB matches or exceeds full fine-tuning and other parameter-efficient methods, with robustness to domain shifts and favorable scaling behavior as model size and data increase. The results imply that large pretrained LMs can be effectively specialized for rescoring in ASR at a fraction of the computational cost, enabling scalable, production-friendly second-pass rescoring.

Abstract

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.
Paper Structure (23 sections, 9 equations, 5 figures, 4 tables)

This paper contains 23 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Illustration of the Low-Rank adaptation based Rescoring BERT (LoRB).
  • Figure 2: Wall-clock training time of LoRB, LoRB+$\mathcal{L}_{cor}$ and Fine-Tuning (FT) when training on messaging data.
  • Figure 3: WER on a conversational test set evaluated by RescoreBERT of size 5M, 170M and 1B, fine-tuned with "conversational domain" data using FT and LoRA.
  • Figure 4: WER evaluated by 1B RescoreBERT, fine-tuned with various sizes of "conversational domain" data using FT and LoRA.
  • Figure 5: WER as a function of data size, evaluated by 1B RescoreBERT, fine-tuned with FT and various ranks of LoRA.