Table of Contents
Fetching ...

SiniticMTError: A Machine Translation Dataset with Error Annotations for Sinitic Languages

Hannah Liu, Junghyun Min, En-Shiun Annie Lee, Ethan Yue Heng Cheung, Shou-Yi Hung, Elsie Chan, Shiyao Qian, Runtong Liang, Kimlan Huynh, Wing Yu Yip, York Hay Ng, TSZ Fung Yau, Ka Ieng Charlotte Lo, You-Wei Wu, Richard Tzong-Han Tsai

Abstract

Despite major advances in machine translation (MT) in recent years, progress remains limited for many low-resource languages that lack large-scale training data and linguistic resources. In this paper, we introduce \dsname, a novel fine-grained dataset that builds on existing parallel corpora to provide error span, error type, and error severity annotations in machine-translated examples from English to Mandarin, Cantonese, and Wu Chinese, along with a Mandarin-Hokkien component derived from a non-parallel source. Our dataset serves as a resource for the MT community to fine-tune models with error detection capabilities, supporting research on translation quality estimation, error-aware generation, and low-resource language evaluation. We also establish baseline results using language models to benchmark translation error detection performance. Specifically, we evaluate multiple open source and closed source LLMs using span-level and correlation-based MQM metrics, revealing their limited precision, underscoring the need for our dataset. Finally, we report our rigorous annotation process by native speakers, with analyses on pilot studies, iterative feedback, insights, and patterns in error type and severity.

SiniticMTError: A Machine Translation Dataset with Error Annotations for Sinitic Languages

Abstract

Despite major advances in machine translation (MT) in recent years, progress remains limited for many low-resource languages that lack large-scale training data and linguistic resources. In this paper, we introduce \dsname, a novel fine-grained dataset that builds on existing parallel corpora to provide error span, error type, and error severity annotations in machine-translated examples from English to Mandarin, Cantonese, and Wu Chinese, along with a Mandarin-Hokkien component derived from a non-parallel source. Our dataset serves as a resource for the MT community to fine-tune models with error detection capabilities, supporting research on translation quality estimation, error-aware generation, and low-resource language evaluation. We also establish baseline results using language models to benchmark translation error detection performance. Specifically, we evaluate multiple open source and closed source LLMs using span-level and correlation-based MQM metrics, revealing their limited precision, underscoring the need for our dataset. Finally, we report our rigorous annotation process by native speakers, with analyses on pilot studies, iterative feedback, insights, and patterns in error type and severity.

Paper Structure

This paper contains 43 sections, 1 equation, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Sample Mandarin entry. mt looks fluent, but contains subtle semantic errors: an unwarranted subjective phrase (Addition) and a lexical mistranslation (Mistranslation). While 漂亮 piao4liang directly translates to beautiful, it usually describes people or objects and 好 good is more natural when used to describe the weather.
  • Figure 2: Overview of our annotation pipeline. We input English sentences from FLORES+ to generate mt outputs (e.g., from NLLB) into Sinitic languages (Mandarin, Cantonese, Wu).