Table of Contents
Fetching ...

Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation

Esther Ploeger, Huiyuan Lai, Rik van Noord, Antonio Toral

TL;DR

This work investigates the decline of lexical diversity in literary machine translation and argues for a book-tailored recovery rather than a universal increase. It introduces a two-stage approach: generate multiple MT candidates with a vanilla system and rerank them using a classifier that detects original Dutch text, with book-specific LexDiv guiding the rank selection. Across 31 English-to-Dutch novels, the tailored reranking approach achieves lexical-diversity levels closer to human translations for several books, while maintaining translation quality. The method is model-agnostic, inference-time adjustable, and highlights the value of aligning MT outputs with source-text diversity profiles to preserve literary style and voice.

Abstract

Machine translations are found to be lexically poorer than human translations. The loss of lexical diversity through MT poses an issue in the automatic translation of literature, where it matters not only what is written, but also how it is written. Current methods for increasing lexical diversity in MT are rigid. Yet, as we demonstrate, the degree of lexical diversity can vary considerably across different novels. Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process. We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.

Towards Tailored Recovery of Lexical Diversity in Literary Machine Translation

TL;DR

This work investigates the decline of lexical diversity in literary machine translation and argues for a book-tailored recovery rather than a universal increase. It introduces a two-stage approach: generate multiple MT candidates with a vanilla system and rerank them using a classifier that detects original Dutch text, with book-specific LexDiv guiding the rank selection. Across 31 English-to-Dutch novels, the tailored reranking approach achieves lexical-diversity levels closer to human translations for several books, while maintaining translation quality. The method is model-agnostic, inference-time adjustable, and highlights the value of aligning MT outputs with source-text diversity profiles to preserve literary style and voice.

Abstract

Machine translations are found to be lexically poorer than human translations. The loss of lexical diversity through MT poses an issue in the automatic translation of literature, where it matters not only what is written, but also how it is written. Current methods for increasing lexical diversity in MT are rigid. Yet, as we demonstrate, the degree of lexical diversity can vary considerably across different novels. Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process. We propose a novel approach that consists of reranking translation candidates with a classifier that distinguishes between original and translated text. We evaluate our approach on 31 English-to-Dutch book translations, and find that, for certain books, our approach retrieves lexical diversity scores that are close to human translation.
Paper Structure (38 sections, 1 equation, 8 figures, 5 tables)

This paper contains 38 sections, 1 equation, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Reranking translation hypotheses based on the probability they are originally written in the target language, where the chosen rank is based on the lexical diversity score of the original book, and could be lower than the most lexically diverse option.
  • Figure 2: Range and spread of lexical diversity metrics for HT (left, yellow) and original English (right, blue).
  • Figure 3: Per-book comparison of MTLD between the (rigid) tagging baseline and (tailored) reranking method, where green dotted lines are HT scores, and red dotted lines represent vanilla MT.
  • Figure 4: Change in MTLD for choosing different ranks, where beam size is 20 and $n=20$.
  • Figure 5: MTLD for highest (green), lowest (red) and tailored (yellow) original-text rank.
  • ...and 3 more figures