Table of Contents
Fetching ...

IsoChronoMeter: A simple and effective isochronic translation evaluation metric

Nikolai Rozanov, Vikentiy Pankov, Dmitrii Mukhutdinov, Dima Vypirailenko

TL;DR

This work motivates the importance of isochronic translation, especially in the context of automatic dubbing, and introduces `IsoChronoMeter' (ICM), a simple yet effective metric to measure isochrony of translations in a scalable and resource-efficient way without the need for gold data.

Abstract

Machine translation (MT) has come a long way and is readily employed in production systems to serve millions of users daily. With the recent advances in generative AI, a new form of translation is becoming possible - video dubbing. This work motivates the importance of isochronic translation, especially in the context of automatic dubbing, and introduces `IsoChronoMeter' (ICM). ICM is a simple yet effective metric to measure isochrony of translations in a scalable and resource-efficient way without the need for gold data, based on state-of-the-art text-to-speech (TTS) duration predictors. We motivate IsoChronoMeter and demonstrate its effectiveness. Using ICM we demonstrate the shortcomings of state-of-the-art translation systems and show the need for new methods. We release the code at this URL: \url{https://github.com/braskai/isochronometer}.

IsoChronoMeter: A simple and effective isochronic translation evaluation metric

TL;DR

This work motivates the importance of isochronic translation, especially in the context of automatic dubbing, and introduces `IsoChronoMeter' (ICM), a simple yet effective metric to measure isochrony of translations in a scalable and resource-efficient way without the need for gold data.

Abstract

Machine translation (MT) has come a long way and is readily employed in production systems to serve millions of users daily. With the recent advances in generative AI, a new form of translation is becoming possible - video dubbing. This work motivates the importance of isochronic translation, especially in the context of automatic dubbing, and introduces `IsoChronoMeter' (ICM). ICM is a simple yet effective metric to measure isochrony of translations in a scalable and resource-efficient way without the need for gold data, based on state-of-the-art text-to-speech (TTS) duration predictors. We motivate IsoChronoMeter and demonstrate its effectiveness. Using ICM we demonstrate the shortcomings of state-of-the-art translation systems and show the need for new methods. We release the code at this URL: \url{https://github.com/braskai/isochronometer}.

Paper Structure

This paper contains 21 sections, 3 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Dataset on English data. On the y-axis there is the relative absolute error between an original TTS-generated audio-sample and the associated prediction. On the x-axis is the number of total words used for the audio sample / prediction. Three curves show a secondary TTS-generated audio-sample (interestingly showing a big error for a few words), a fine-tuned duration predictor and the original duration predictor.
  • Figure 2: A histogram of sentence count vs. number of tokens in a sentence. I.e. the x-axis represents the number of tokens in a sentence, the y-axis is the total count of such sentences.