ProGRes: Prompted Generative Rescoring on ASR n-Best

Ada Defne Tur; Adel Moumen; Mirco Ravanelli

ProGRes: Prompted Generative Rescoring on ASR n-Best

Ada Defne Tur, Adel Moumen, Mirco Ravanelli

TL;DR

A novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs is proposed, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation.

Abstract

Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best hypotheses generated during the beam search process. However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence scorer LLM. We evaluated our approach using different speech recognizers and observed significant relative improvement in the word error rate (WER) ranging from 5% to 25%.

ProGRes: Prompted Generative Rescoring on ASR n-Best

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 3 figures, 1 table)

This paper contains 20 sections, 3 equations, 3 figures, 1 table.

Introduction
Proposed Method
Prompted Generation
LLM score
ASR score
Interpolation
Related Work
Experimental Setup
ASRs
LLMs
Evaluation Dataset
Libraries
Results
Baselines
Prompted Generative Rescoring
...and 5 more sections

Figures (3)

Figure 1: Example of a prompted generation using ASR $n$-best.
Figure 2: Overview of the ProGRes pipeline: (1) An ASR generates $n$-best hypotheses. (2) The hypotheses are extended with a suggestion from an LLM. (3) The entire set is rescored to produce the final transcription.
Figure 3: WER results for different language model weights $\alpha$. The left panel shows results for ASR$_1$, and the right panel shows results for ASR$_2$. Non-prompted results simply refer to the original ASR $n$-best hypotheses.

ProGRes: Prompted Generative Rescoring on ASR n-Best

TL;DR

Abstract

ProGRes: Prompted Generative Rescoring on ASR n-Best

Authors

TL;DR

Abstract

Table of Contents

Figures (3)