Table of Contents
Fetching ...

To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation

Jiaming Luo, Colin Cherry, George Foster

TL;DR

The work tackles how MT and HT differ in morphosyntactic structure by analyzing divergences across three language pairs with UD-based annotations. It shows that MT is more conservative, with lower diversity and higher one-to-one mappings, and that beam search strongly biases MT toward convergent patterns, especially when convergent patterns are around 50% prevalent in training data. Most frequent HT divergences correlate with MT quality declines, though not universally, indicating nuanced interactions between structure, data frequency, and decoding. The findings reveal a fundamental bias in current MT decoding toward translationese-like literalness and provide a fine-grained framework for diagnosing and potentially mitigating these effects in MT systems, including future exploration with LLM-based MT approaches.

Abstract

We conduct a large-scale fine-grained comparative analysis of machine translations (MT) against human translations (HT) through the lens of morphosyntactic divergence. Across three language pairs and two types of divergence defined as the structural difference between the source and the target, MT is consistently more conservative than HT, with less morphosyntactic diversity, more convergent patterns, and more one-to-one alignments. Through analysis on different decoding algorithms, we attribute this discrepancy to the use of beam search that biases MT towards more convergent patterns. This bias is most amplified when the convergent pattern appears around 50% of the time in training data. Lastly, we show that for a majority of morphosyntactic divergences, their presence in HT is correlated with decreased MT performance, presenting a greater challenge for MT systems.

To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation

TL;DR

The work tackles how MT and HT differ in morphosyntactic structure by analyzing divergences across three language pairs with UD-based annotations. It shows that MT is more conservative, with lower diversity and higher one-to-one mappings, and that beam search strongly biases MT toward convergent patterns, especially when convergent patterns are around 50% prevalent in training data. Most frequent HT divergences correlate with MT quality declines, though not universally, indicating nuanced interactions between structure, data frequency, and decoding. The findings reveal a fundamental bias in current MT decoding toward translationese-like literalness and provide a fine-grained framework for diagnosing and potentially mitigating these effects in MT systems, including future exploration with LLM-based MT approaches.

Abstract

We conduct a large-scale fine-grained comparative analysis of machine translations (MT) against human translations (HT) through the lens of morphosyntactic divergence. Across three language pairs and two types of divergence defined as the structural difference between the source and the target, MT is consistently more conservative than HT, with less morphosyntactic diversity, more convergent patterns, and more one-to-one alignments. Through analysis on different decoding algorithms, we attribute this discrepancy to the use of beam search that biases MT towards more convergent patterns. This bias is most amplified when the convergent pattern appears around 50% of the time in training data. Lastly, we show that for a majority of morphosyntactic divergences, their presence in HT is correlated with decreased MT performance, presenting a greater challenge for MT systems.
Paper Structure (34 sections, 1 equation, 10 figures, 6 tables)

This paper contains 34 sections, 1 equation, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Top table: Examples of divergences in HT for En$\to$Fr WMT15 training data bojar-etal-2015-findings, with relevant fragments of the source/target shown in the first/second rows. The English control constructions are bolded including both the finite root verb and the controlled word, while the French phrases of interest are underlined. Bottom figure: Percentages of target patterns for HT and MT, with obligatory control finite verbs as the source pattern. o2o:conv: one-to-one convergent patterns where the target phrase uses a similar control construction to the source; o2o:div: one-to-one divergent patterns where the target differs structurally from the source; null: no target word is aligned; others: other less frequent patterns (e.g., one-to-many alignments). The percentages of all four categories sum up to 100%.
  • Figure 2: An illustration of the two types of morphosyntactic divergence. See Section \ref{['sec:experimental_setup']} for details.
  • Figure 3: Plot of convergence rate vs entropy for the most frequent word-based source patterns in En$\to$Fr human translations, three of which are highlighted in black: (1) amod ADJ leaf (high convergence rate, low entropy): the most common cases of adjectival modifiers; (2) acl VERB nsubj (low convergence rate, high entropy): object relative clauses without a relative pronoun, or subject relative clauses. The high entropy reflects a major difference between English and French, where the relative pronoun que is obligatory in French but not in English. (3) amod PROPN leaf (low convergence rate, low entropy): adjectives as part of a proper nouns. Adjectives in official institutions and titles are typically capitalized and annotated as PROPN in English (e.g., Secretary General) but lowercased and annotated as ADJ in French (e.g., secrétaire général).
  • Figure 4: Stacked histogram of the relative differences in source pattern-specific diversity score.
  • Figure 5: Stacked histogram of the absolute differences in source pattern-specific convergence rate.
  • ...and 5 more figures