The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations

Fan Zhou

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations

Fan Zhou

TL;DR

This work investigates translationese in neural machine translation (NMT) versus human translation (HT) by quantifying translation-relations across two English-Chinese parallel corpora. Using a 14-category taxonomy of translation techniques and token-level annotations, the study contrasts MT and HT on overall translation relations, non-literal techniques, and factors driving technique choice. The main finding is that MT (GNMT) exhibits a stronger literal bias (about 77% vs 64% for HT) while maintaining comparable performance to HT for certain syntactic non-literal techniques, but lagging on semantic-level techniques such as particularization, equivalence, and generalization. These results highlight areas where NMT can be improved to reduce translationese and approach human parity, with implications for targeted linguistic enhancements and future multi-system evaluations.

Abstract

This study explores the distinctions between neural machine translation (NMT) and human translation (HT) through the lens of translation relations. It benchmarks HT to assess the translation techniques produced by an NMT system and aims to address three key research questions: the differences in overall translation relations between NMT and HT, how each utilizes non-literal translation techniques, and the variations in factors influencing their use of specific non-literal techniques. The research employs two parallel corpora, each spanning nine genres with the same source texts with one translated by NMT and the other by humans. Translation relations in these corpora are manually annotated on aligned pairs, enabling a comparative analysis that draws on linguistic insights, including semantic and syntactic nuances such as hypernyms and alterations in part-of-speech tagging. The results indicate that NMT relies on literal translation significantly more than HT across genres. While NMT performs comparably to HT in employing syntactic non-literal translation techniques, it falls behind in semantic-level performance.

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations

TL;DR

Abstract

Paper Structure (46 sections, 14 figures)

This paper contains 46 sections, 14 figures.

Introduction
Choice of NMT system and its human parity
NMT translationese
Translation relations
Summary
Related work
Translationese
Translationese definition
Translationese features
Machine translation translationese
Machine tranlationese v.s. human translationese
Metrics for translation quality
Automatic evaluation of machine translation
Traditional metrics
Translation relations
...and 31 more sections

Figures (14)

Figure 1: Figure 1: The number of tokens in each sentence of each genre in three corpora
Figure 2: Figure 2: Hierarchy of translation relations (Chuquet et al., 1989; Zhai et al., 2018)
Figure 3: Figure 3: Input of source.txt file and target.txt file (Example from Liu (2015))
Figure 4: Figure 4: Output of token index alignment (Example from Liu (2015))
Figure 5: Figure 5: The operation web page of YAWAT
...and 9 more figures

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations

TL;DR

Abstract

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations

Authors

TL;DR

Abstract

Table of Contents

Figures (14)