ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair

Hong-Viet Tran; Minh-Quy Nguyen; Van-Vinh Nguyen

ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair

Hong-Viet Tran, Minh-Quy Nguyen, Van-Vinh Nguyen

TL;DR

This paper reports on the VLSP 2022-2023 machine translation shared tasks focusing on Vietnamese-Chinese and Vietnamese-Lao directions. It surveys data preparation, evaluation protocols, and a diverse set of system submissions, including back-translation, data synthesis, and phrase-aware Transformer adaptations, across three leading approaches per task. Both automatic metrics (BLEU, SacreBLEU, NIST, TER) and human post-editing evaluations are used to rank systems, with findings showing consistent gains from data augmentation and pretrained-model adaptations in low-resource settings. The work underscores the practical impact of robust evaluation infrastructure and diverse modeling strategies on improving translation quality for closely related, low-resource language pairs, and looks ahead to broader multilingual support via Vietnamese LLMs and additional languages.

Abstract

This paper presents an results of the VLSP 2022-2023 Machine Translation Shared Tasks, focusing on Vietnamese-Chinese and Vietnamese-Lao machine translation. The tasks were organized as part of the 9th, 10th annual workshop on Vietnamese Language and Speech Processing (VLSP 2022, VLSP 2023). The objective of the shared task was to build machine translation systems, specifically targeting Vietnamese-Chinese and Vietnamese-Lao translation (corresponding to 4 translation directions). The submission were evaluated on 1,000 pairs for testing (news and general domains) using established metrics like BLEU [11] and SacreBLEU [12]. Additionally, system outputs also were evaluated with human judgment provided by experts in Chinese and Lao languages. These human assessments played a crucial role in ranking the performance of the machine translation models, ensuring a more comprehensive evaluation.

ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair

TL;DR

Abstract

Paper Structure (15 sections, 7 figures, 9 tables)

This paper contains 15 sections, 7 figures, 9 tables.

Introduction
Training & Test Data
Evaluation
System submissions
Vietnamese-Chinese Machine Translation
An Efficient Approach for Machine Translation on Low-resource Languages.
VBD-MT Vietnamese-Chinese Bidirectional Translation System
An Effective Method using Phrase Mechanism in Neural Machine Translation
Vietnamese-Lao Machine Translation
A Transformer-Based Model for Lao-Vietnamese Machine Translation
Vietnamese-Lao Bidirectional Translation System
A Sequence-to-Sequence Model for Lao-Vietnamese Machine Translation
Experimental Results
Human Evaluation
Conclusions

Figures (7)

Figure 1: Flow of data processing and model training
Figure 2: System flow machine translation
Figure 3: Overview of PhraseTransformer (CrossH) using $n$-gram LSTM in MultiHead layer. In this case, the phrase representations are built with gram_size = {2, 3}, 2-gram, 3-gram models apply to all 8 heads.
Figure 4: Training phases of mBART and Transformer WMT models.
Figure 5: Training phases of mT5_small and m2m_100-418M models.
...and 2 more figures

ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair

TL;DR

Abstract

ViBidirectionMT-Eval: Machine Translation for Vietnamese-Chinese and Vietnamese-Lao language pair

Authors

TL;DR

Abstract

Table of Contents

Figures (7)