Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction
Rithik Sachdev, Zhong-Qiu Wang, Chao-Han Huck Yang
TL;DR
The paper tackles improving post-ASR error correction by leveraging LLM prompts and an evolutionary search to optimize those prompts. It introduces a Baseline plus five Alternative Prompts and applies the EvoPrompt GA-based framework to mutate and cross over prompts over $T=3$ iterations with $N=5$ candidates, evaluating on the CHiME-4 subset of HyPoradise for GenSEC Task 1. Results show that alternative prompts outperform the Baseline and that EvoPrompt can achieve a best WER of $WER=4.88\%$ on CHiME-4, with insights into prompt design and the role of demonstrations and domain cues. Cross-domain tests on Common Voice and WSJ reveal mixed generalization, and a cost analysis for Claude Sonnet 3.5 demonstrates practical resource considerations for prompt-optimization pipelines. Overall, the work establishes evolutionary prompt design as a viable strategy to enhance LLM-based post-ASR error correction and paves the way for further domain-adaptive prompting and potential LLM fine-tuning.
Abstract
Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. One representative approach is to leverage in-context learning to prompt LLMs so that a better hypothesis can be generated by the LLMs based on a carefully-designed prompt and an $N$-best list of hypotheses produced by ASR systems. However, it is yet unknown whether the existing prompts are the most effective ones for the task of post-ASR error correction. In this context, this paper first explores alternative prompts to identify an initial set of effective prompts, and then proposes to employ an evolutionary prompt optimization algorithm to refine the initial prompts. Evaluations results on the CHiME-4 subset of the Task $1$ of the SLT $2024$ GenSEC challenge show the effectiveness and potential of the proposed algorithms.
