Robust ASR Error Correction with Conservative Data Filtering

Takuma Udagawa; Masayuki Suzuki; Masayasu Muraoka; Gakuto Kurata

Robust ASR Error Correction with Conservative Data Filtering

Takuma Udagawa, Masayuki Suzuki, Masayasu Muraoka, Gakuto Kurata

TL;DR

This work proposes two fundamental criteria that EC training data should satisfy: namely, EC targets should improve linguistic acceptability over sources and be inferable from the available context (e.g. source phonemes), and can significantly reduce overcorrection in out-of-domain (OOD) settings.

Abstract

Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise which can make the EC models brittle, e.g. inducing overcorrection in out-of-domain (OOD) settings. In this work, we propose two fundamental criteria that EC training data should satisfy: namely, EC targets should (1) improve linguistic acceptability over sources and (2) be inferable from the available context (e.g. source phonemes). Through these criteria, we identify low-quality EC pairs and train the models not to make any correction in such cases, the process we refer to as conservative data filtering. In our experiments, we focus on Japanese ASR using a strong Conformer-CTC as the baseline and finetune Japanese LLMs for EC. Through our evaluation on a suite of 21 internal benchmarks, we demonstrate that our approach can significantly reduce overcorrection and improve both the accuracy and quality of ASR results in the challenging OOD settings.

Robust ASR Error Correction with Conservative Data Filtering

TL;DR

Abstract

Paper Structure (14 sections, 2 equations, 3 figures, 7 tables)

This paper contains 14 sections, 2 equations, 3 figures, 7 tables.

Introduction
Related Work
Methods
Criteria 1: EC targets should improve linguistic acceptability over sources.
Criteria 2: EC targets should be inferable from the available context.
Experimental Setup
ASR System
EC Model
Evaluation
Results and Discussion
Conclusion
Additional Data Examples
Benchmark Details
Experiments based on Sarashina-2 7B

Figures (3)

Figure 1: An illustration of our conservative data filtering. Precise details and terminologies are explained in \ref{['sec:methods']}.
Figure 2: Log-likelihood ratios for the two criteria, i.e. $\log \frac{p(W^T)}{p(W^S)}$ for C1 and $\log \frac{p_{\scaleto{EC}{2.5pt}}(W^T \,|\, \overline{W}^S)}{p_{\scaleto{EC}{2.5pt}}(W^S \,|\, \overline{W}^S)}$ for C2. Red line shows the default threshold ($c_1 = c_2 = 1$).
Figure 3: Log-likelihood ratio for the two criteria using Sarashina-2 7B. Red line shows the default threshold ($c_1 = c_2 = 1$).

Robust ASR Error Correction with Conservative Data Filtering

TL;DR

Abstract

Robust ASR Error Correction with Conservative Data Filtering

Authors

TL;DR

Abstract

Table of Contents

Figures (3)