Table of Contents
Fetching ...

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann

TL;DR

This work presents a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics for Automatic Speech Recognition (ASR).

Abstract

The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR). Transcripts are usually pre-processed by substituting specific characters to account for non-semantic differences. As a result of this normalisation, information on the accuracy of punctuation or capitalisation is lost. We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics. Transcription errors are also classified more granularly by existing string similarity and phonetic algorithms. An evaluation on several datasets demonstrates the practical equivalence of our approach compared to common WER computations. We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation. The code is available open-source.

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

TL;DR

This work presents a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics for Automatic Speech Recognition (ASR).

Abstract

The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR). Transcripts are usually pre-processed by substituting specific characters to account for non-semantic differences. As a result of this normalisation, information on the accuracy of punctuation or capitalisation is lost. We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics. Transcription errors are also classified more granularly by existing string similarity and phonetic algorithms. An evaluation on several datasets demonstrates the practical equivalence of our approach compared to common WER computations. We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation. The code is available open-source.
Paper Structure (14 sections, 5 equations, 3 figures, 2 tables)

This paper contains 14 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Processing pipeline: The lexer transforms the input texts into a list of tokens, which are further normalised by several text pre-processors. An extended variant of the Levenshtein distance algorithm with compound word detection and variable edit costs determines the shortest route of modifications. Substitutions are further classified as punctuation, capitalisation, or word errors (e.g. suffix or homophone). The route is used for calculating metrics like WER, categorising types of errors, and visualising text differences.
  • Figure 2: Backtrace Matrix: The shortest route is found by traversing the operations starting from the bottom right of the matrix. The operations and movements are: $O$=OK (up left), $I$=Insertion (left), $D$=Deletion (up), $S$=Substitution (up left), $C_H$=Compound hypothesis (left), $C_R$=Compound reference (up), $C_E$=Compound end (up left).
  • Figure 3: An interactive web application visualises text differences, error types, and normalisations and calculates several error metrics like WER, SER, and F1-scores.