Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

Rithik Sachdev; Zhong-Qiu Wang; Chao-Han Huck Yang

Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

Rithik Sachdev, Zhong-Qiu Wang, Chao-Han Huck Yang

TL;DR

The paper tackles improving post-ASR error correction by leveraging LLM prompts and an evolutionary search to optimize those prompts. It introduces a Baseline plus five Alternative Prompts and applies the EvoPrompt GA-based framework to mutate and cross over prompts over $T=3$ iterations with $N=5$ candidates, evaluating on the CHiME-4 subset of HyPoradise for GenSEC Task 1. Results show that alternative prompts outperform the Baseline and that EvoPrompt can achieve a best WER of $WER=4.88\%$ on CHiME-4, with insights into prompt design and the role of demonstrations and domain cues. Cross-domain tests on Common Voice and WSJ reveal mixed generalization, and a cost analysis for Claude Sonnet 3.5 demonstrates practical resource considerations for prompt-optimization pipelines. Overall, the work establishes evolutionary prompt design as a viable strategy to enhance LLM-based post-ASR error correction and paves the way for further domain-adaptive prompting and potential LLM fine-tuning.

Abstract

Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. One representative approach is to leverage in-context learning to prompt LLMs so that a better hypothesis can be generated by the LLMs based on a carefully-designed prompt and an $N$-best list of hypotheses produced by ASR systems. However, it is yet unknown whether the existing prompts are the most effective ones for the task of post-ASR error correction. In this context, this paper first explores alternative prompts to identify an initial set of effective prompts, and then proposes to employ an evolutionary prompt optimization algorithm to refine the initial prompts. Evaluations results on the CHiME-4 subset of the Task $1$ of the SLT $2024$ GenSEC challenge show the effectiveness and potential of the proposed algorithms.

Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

TL;DR

iterations with

candidates, evaluating on the CHiME-4 subset of HyPoradise for GenSEC Task 1. Results show that alternative prompts outperform the Baseline and that EvoPrompt can achieve a best WER of

on CHiME-4, with insights into prompt design and the role of demonstrations and domain cues. Cross-domain tests on Common Voice and WSJ reveal mixed generalization, and a cost analysis for Claude Sonnet 3.5 demonstrates practical resource considerations for prompt-optimization pipelines. Overall, the work establishes evolutionary prompt design as a viable strategy to enhance LLM-based post-ASR error correction and paves the way for further domain-adaptive prompting and potential LLM fine-tuning.

Abstract

-best list of hypotheses produced by ASR systems. However, it is yet unknown whether the existing prompts are the most effective ones for the task of post-ASR error correction. In this context, this paper first explores alternative prompts to identify an initial set of effective prompts, and then proposes to employ an evolutionary prompt optimization algorithm to refine the initial prompts. Evaluations results on the CHiME-4 subset of the Task

of the SLT

GenSEC challenge show the effectiveness and potential of the proposed algorithms.

Paper Structure (13 sections, 3 figures, 4 tables)

This paper contains 13 sections, 3 figures, 4 tables.

Introduction
Proposed Algorithms
Alternative Prompt Design
Employing EvoPrompt for Prompt Optimization
Experimental Setup
Evaluation Results
Results of Alternative Prompts
Results of EvoPrompt for Prompt Optimization
Analysis of Optimized Prompt
Examples of Corrected Errors
Generalizability of Optimized Prompts to Unseen Domains
Cost of Proposed Algorithms
Conclusions

Figures (3)

Figure 1: Approach overview, where an $N$-best list of hypotheses and a trainable prompt instruction are fed to a pre-trained LLM for error correction. Details of prompt design processes are shown in Fig. \ref{['in_context_learning_example']}.
Figure 2: Example of text prompt optimization processes through (i) cross-over and (ii) mutation performed by LLM-operators.
Figure 3: Example input for in-context learning with one demonstration, where the first paragraph denotes the prompt, second and third denotes one demonstration example, and fourth and fifth requests LLMs to correct errors.

Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

TL;DR

Abstract

Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)