Table of Contents
Fetching ...

Towards interfacing large language models with ASR systems using confidence measures and prompting

Maryam Naderi, Enno Hermann, Alexandre Nanchen, Sevada Hovsepyan, Mathew Magimai. -Doss

TL;DR

This work investigates post-hoc correction of ASR transcripts with LLMs and proposes a range of confidence-based filtering methods that can improve the performance of less competitive ASR systems.

Abstract

As large language models (LLMs) grow in parameter size and capabilities, such as interaction through prompting, they open up new ways of interfacing with automatic speech recognition (ASR) systems beyond rescoring n-best lists. This work investigates post-hoc correction of ASR transcripts with LLMs. To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.

Towards interfacing large language models with ASR systems using confidence measures and prompting

TL;DR

This work investigates post-hoc correction of ASR transcripts with LLMs and proposes a range of confidence-based filtering methods that can improve the performance of less competitive ASR systems.

Abstract

As large language models (LLMs) grow in parameter size and capabilities, such as interaction through prompting, they open up new ways of interfacing with automatic speech recognition (ASR) systems beyond rescoring n-best lists. This work investigates post-hoc correction of ASR transcripts with LLMs. To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.
Paper Structure (16 sections, 3 figures, 5 tables)

This paper contains 16 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Proposed approach (left) and speech processing in the brain (right).
  • Figure 2: WER for various sentence-level (left) and lowest-word (right) confidence thresholds for Tiny, Medium, and Large V3 Whisper models applied on dev-clean dataset with gpt-3.5-turbo-1106.
  • Figure 3: WER for various thresholds for specific low-confidence words with Tiny Whisper model applied on dev-clean dataset with gpt-3.5-turbo-1106.