Table of Contents
Fetching ...

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations

Fan Zhou

TL;DR

This work investigates translationese in neural machine translation (NMT) versus human translation (HT) by quantifying translation-relations across two English-Chinese parallel corpora. Using a 14-category taxonomy of translation techniques and token-level annotations, the study contrasts MT and HT on overall translation relations, non-literal techniques, and factors driving technique choice. The main finding is that MT (GNMT) exhibits a stronger literal bias (about 77% vs 64% for HT) while maintaining comparable performance to HT for certain syntactic non-literal techniques, but lagging on semantic-level techniques such as particularization, equivalence, and generalization. These results highlight areas where NMT can be improved to reduce translationese and approach human parity, with implications for targeted linguistic enhancements and future multi-system evaluations.

Abstract

This study explores the distinctions between neural machine translation (NMT) and human translation (HT) through the lens of translation relations. It benchmarks HT to assess the translation techniques produced by an NMT system and aims to address three key research questions: the differences in overall translation relations between NMT and HT, how each utilizes non-literal translation techniques, and the variations in factors influencing their use of specific non-literal techniques. The research employs two parallel corpora, each spanning nine genres with the same source texts with one translated by NMT and the other by humans. Translation relations in these corpora are manually annotated on aligned pairs, enabling a comparative analysis that draws on linguistic insights, including semantic and syntactic nuances such as hypernyms and alterations in part-of-speech tagging. The results indicate that NMT relies on literal translation significantly more than HT across genres. While NMT performs comparably to HT in employing syntactic non-literal translation techniques, it falls behind in semantic-level performance.

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations

TL;DR

This work investigates translationese in neural machine translation (NMT) versus human translation (HT) by quantifying translation-relations across two English-Chinese parallel corpora. Using a 14-category taxonomy of translation techniques and token-level annotations, the study contrasts MT and HT on overall translation relations, non-literal techniques, and factors driving technique choice. The main finding is that MT (GNMT) exhibits a stronger literal bias (about 77% vs 64% for HT) while maintaining comparable performance to HT for certain syntactic non-literal techniques, but lagging on semantic-level techniques such as particularization, equivalence, and generalization. These results highlight areas where NMT can be improved to reduce translationese and approach human parity, with implications for targeted linguistic enhancements and future multi-system evaluations.

Abstract

This study explores the distinctions between neural machine translation (NMT) and human translation (HT) through the lens of translation relations. It benchmarks HT to assess the translation techniques produced by an NMT system and aims to address three key research questions: the differences in overall translation relations between NMT and HT, how each utilizes non-literal translation techniques, and the variations in factors influencing their use of specific non-literal techniques. The research employs two parallel corpora, each spanning nine genres with the same source texts with one translated by NMT and the other by humans. Translation relations in these corpora are manually annotated on aligned pairs, enabling a comparative analysis that draws on linguistic insights, including semantic and syntactic nuances such as hypernyms and alterations in part-of-speech tagging. The results indicate that NMT relies on literal translation significantly more than HT across genres. While NMT performs comparably to HT in employing syntactic non-literal translation techniques, it falls behind in semantic-level performance.
Paper Structure (46 sections, 14 figures)

This paper contains 46 sections, 14 figures.

Figures (14)

  • Figure 1: Figure 1: The number of tokens in each sentence of each genre in three corpora
  • Figure 2: Figure 2: Hierarchy of translation relations (Chuquet et al., 1989; Zhai et al., 2018)
  • Figure 3: Figure 3: Input of source.txt file and target.txt file (Example from Liu (2015))
  • Figure 4: Figure 4: Output of token index alignment (Example from Liu (2015))
  • Figure 5: Figure 5: The operation web page of YAWAT
  • ...and 9 more figures