The Role of Handling Attributive Nouns in Improving Chinese-To-English Machine Translation
Lisa Wang, Adam Meyers, John E. Ortega, Rodolfo Zevallos
TL;DR
This study investigates multilingual suicidal ideation detection using multilingual pretrained language models (mBERT, XLM-R, mT5) and cross-lingual data augmentation via SeamlessM4T translations of Spanish tweets into six languages. By fine-tuning three models on translated datasets and evaluating with 10-fold cross-validation, it demonstrates that mT5 achieves the strongest performance across languages, with English and Spanish yielding the highest metrics while German and Italian pose greater challenges. Translation quality is assessed via perplexity, revealing generally good fluency in English and Portuguese translations, and highlighting cross-lingual transfer as a viable strategy for low-resource or multilingual settings. The work also emphasizes ethical considerations and outlines future directions, including broader language coverage, richer labeling, and integration into healthcare contexts.
Abstract
Translating between languages with drastically different grammatical conventions poses challenges, not just for human interpreters but also for machine translation systems. In this work, we specifically target the translation challenges posed by attributive nouns in Chinese, which frequently cause ambiguities in English translation. By manually inserting the omitted particle X ('DE'). In news article titles from the Penn Chinese Discourse Treebank, we developed a targeted dataset to fine-tune Hugging Face Chinese to English translation models, specifically improving how this critical function word is handled. This focused approach not only complements the broader strategies suggested by previous studies but also offers a practical enhancement by specifically addressing a common error type in Chinese-English translation.
