Prediction of Translation Techniques for the Translation Process
Fan Zhou, Vincent Vandeghinste
TL;DR
This work investigates predicting human translation techniques to guide machine translation under two workflows: from-scratch translation and post-editing. It builds a English-Chinese dataset of over 100,000 aligned units labeled with translation techniques and tests four encoder-based architectures using cross-lingual models (mBERT, mBART, mT5) to forecast the most suitable techniques. The results show 82% predictive accuracy for from-scratch translation and 93% for post-editing, indicating strong potential for pre-translation guidance and prompting for large language models. Limitations include the focus on pre-translation phases, not directly integrating techniques into decoders, and challenges in data availability and sub-sentence alignment automation; future work aims to incorporate techniques into the decoder and automate alignment to scale the approach.
Abstract
Machine translation (MT) encompasses a variety of methodologies aimed at enhancing the accuracy of translations. In contrast, the process of human-generated translation relies on a wide range of translation techniques, which are crucial for ensuring linguistic adequacy and fluency. This study suggests that these translation techniques could further optimize machine translation if they are automatically identified before being applied to guide the translation process effectively. The study differentiates between two scenarios of the translation process: from-scratch translation and post-editing. For each scenario, a specific set of experiments has been designed to forecast the most appropriate translation techniques. The findings indicate that the predictive accuracy for from-scratch translation reaches 82%, while the post-editing process exhibits even greater potential, achieving an accuracy rate of 93%.
