Interplay of Machine Translation, Diacritics, and Diacritization

Wei-Rui Chen; Ife Adebara; Muhammad Abdul-Mageed

Interplay of Machine Translation, Diacritics, and Diacritization

Wei-Rui Chen, Ife Adebara, Muhammad Abdul-Mageed

TL;DR

This work systematically analyzes how diacritics and diacritization interact with machine translation across 55 languages, comparing high-resource and low-resource scenarios. It introduces four model paradigms (OnlyMTdia, OnlyMTundia, OnlyDia, DiaMT) to probe MT–diacritization and MT–diacritics interactions, across varied train sizes and datasets derived from Bible and Europarl corpora. Key findings reveal that diacritization boosts MT in low-resource settings but can degrade MT in high-resource settings, while MT largely harms diacritization except in select large-data cases; conversely, keeping diacritics in MT often has minimal impact. The authors also introduce two classes of language-agnostic diacritics complexity metrics (ratio- and entropy-based) that strongly correlate with diacritization performance, enabling predictive guidance for diacritization and MT system design in diverse resource regimes. Overall, the paper provides actionable insights and quantitative tools that generalize beyond the tested 55 languages, guiding multi-task and single-task MT/diacritization strategies and highlighting the role of diacritic complexity in system performance.

Abstract

We investigate two research questions: (1) how do machine translation (MT) and diacritization influence the performance of each other in a multi-task learning setting (2) the effect of keeping (vs. removing) diacritics on MT performance. We examine these two questions in both high-resource (HR) and low-resource (LR) settings across 55 different languages (36 African languages and 19 European languages). For (1), results show that diacritization significantly benefits MT in the LR scenario, doubling or even tripling performance for some languages, but harms MT in the HR scenario. We find that MT harms diacritization in LR but benefits significantly in HR for some languages. For (2), MT performance is similar regardless of diacritics being kept or removed. In addition, we propose two classes of metrics to measure the complexity of a diacritical system, finding these metrics to correlate positively with the performance of our diacritization models. Overall, our work provides insights for developing MT and diacritization systems under different data size conditions and may have implications that generalize beyond the 55 languages we investigate.

Interplay of Machine Translation, Diacritics, and Diacritization

TL;DR

Abstract

Paper Structure (22 sections, 12 figures, 14 tables)

This paper contains 22 sections, 12 figures, 14 tables.

Introduction
Related Work
Experiments
Setup
Evaluation Metrics
Models & Training
Data
Data Sources
Train Sizes
Data Processing
Post-processing Predictions
Complexity Metrics
Results and Analyses
Findings to Research Questions
Function of Diacritics and MT Performance
...and 7 more sections

Figures (12)

Figure 1: Illustration of our experimental setup, taking a Swedish datapoint 'tack så mycket.' (thank you very much.) as an example. To answer our (RQs), we develop four types of models: three single-task models OnlyMTdia (trained to translate with diacritized source), OnlyMTundia (trained to translate with undiacritized source), and OnlyDia (trained to diacritize); and one multi-task model DiaMT (trained to translate and diacritize simultaneously).
Figure 2: Percentage change of the BLEU/DER/WER averages among languages in each train size. $pc(m1, m2)$ is the percentage change of the metric values produced by model $1$ (m1) over model $2$ (m2) with $pc(m1, m2) = (m1-m2)/m1$. We indicate the research question each line addresses in the legends. Left column: African languages. Right column: European languages. Top row: BLEU scores. Bottom row: DER and WER.
Figure 3: A guideline for training strategies under different data size conditions for diacritization (Dia) and/or machine translation (MT) derived by approaching RQ1a and RQ1b under different train sizes.
Figure 4: Differences of average BLEU scores between OnlyMTdia and OnlyMTundia for three different groups of diacritical functions (lex only, gra only and lex+gra) for European languages at different train sizes.
Figure C.1: BLEU comparison between DiaMT and OnlyMTundia for 36 African languages to English pairs.
...and 7 more figures

Interplay of Machine Translation, Diacritics, and Diacritization

TL;DR

Abstract

Interplay of Machine Translation, Diacritics, and Diacritization

Authors

TL;DR

Abstract

Table of Contents

Figures (12)