Table of Contents
Fetching ...

Chain of Correction for Full-text Speech Recognition with Large Language Models

Zhiyuan Tang, Dong Wang, Zhikai Zhou, Yong Liu, Shen Huang, Shidong Shang

TL;DR

This paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding, and investigates using other types of information to guide error correction.

Abstract

Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) is attracting increased attention for its ability to address a wide range of error types, such as punctuation restoration and inverse text normalization, across long context. However, challenges remain regarding stability, controllability, completeness, and fluency. To mitigate these issues, this paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding. Utilizing the open-sourced ChFT dataset, we fine-tune a pre-trained LLM to evaluate CoC's performance. Experiments show that CoC significantly outperforms baseline and benchmark systems in correcting full-text ASR outputs. We also analyze correction thresholds to balance under-correction and over-rephrasing, extrapolate CoC on extra-long ASR outputs, and explore using other types of information to guide error correction.

Chain of Correction for Full-text Speech Recognition with Large Language Models

TL;DR

This paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding, and investigates using other types of information to guide error correction.

Abstract

Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) is attracting increased attention for its ability to address a wide range of error types, such as punctuation restoration and inverse text normalization, across long context. However, challenges remain regarding stability, controllability, completeness, and fluency. To mitigate these issues, this paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding. Utilizing the open-sourced ChFT dataset, we fine-tune a pre-trained LLM to evaluate CoC's performance. Experiments show that CoC significantly outperforms baseline and benchmark systems in correcting full-text ASR outputs. We also analyze correction thresholds to balance under-correction and over-rephrasing, extrapolate CoC on extra-long ASR outputs, and explore using other types of information to guide error correction.

Paper Structure

This paper contains 11 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The Chain of Correction (CoC) paradigm for full-text ASR error correction with LLMs. All pre-recognized segments constitute the full text.
  • Figure 2: Message template for Chain of Correction. The gray part is for translation only. The blue block represents the pre-recognized full text as context. The yellow and green blocks are the pre-recognized segments to be corrected and the corrected ones, respectively.
  • Figure 3: The trend of Mandarin ERR and correction ratio with different Correction Threshold values. The solid line represents the ERR, and the dashed line indicates the correction ratio.