ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair
Hong-Viet Tran, Minh-Quy Nguyen, Van-Vinh Nguyen
TL;DR
This paper reports on the VLSP 2022-2023 machine translation shared tasks focusing on Vietnamese-Chinese and Vietnamese-Lao directions. It surveys data preparation, evaluation protocols, and a diverse set of system submissions, including back-translation, data synthesis, and phrase-aware Transformer adaptations, across three leading approaches per task. Both automatic metrics (BLEU, SacreBLEU, NIST, TER) and human post-editing evaluations are used to rank systems, with findings showing consistent gains from data augmentation and pretrained-model adaptations in low-resource settings. The work underscores the practical impact of robust evaluation infrastructure and diverse modeling strategies on improving translation quality for closely related, low-resource language pairs, and looks ahead to broader multilingual support via Vietnamese LLMs and additional languages.
Abstract
This paper presents an results of the VLSP 2022-2023 Machine Translation Shared Tasks, focusing on Vietnamese-Chinese and Vietnamese-Lao machine translation. The tasks were organized as part of the 9th, 10th annual workshop on Vietnamese Language and Speech Processing (VLSP 2022, VLSP 2023). The objective of the shared task was to build machine translation systems, specifically targeting Vietnamese-Chinese and Vietnamese-Lao translation (corresponding to 4 translation directions). The submission were evaluated on 1,000 pairs for testing (news and general domains) using established metrics like BLEU [11] and SacreBLEU [12]. Additionally, system outputs also were evaluated with human judgment provided by experts in Chinese and Lao languages. These human assessments played a crucial role in ranking the performance of the machine translation models, ensuring a more comprehensive evaluation.
