Characterizing the Effects of Translation on Intertextuality using Multilingual Embedding Spaces
Hope McGovern, Hale Sirin, Tom Lippincott
TL;DR
The paper tackles how translation affects intertextual references in Biblical texts by leveraging multilingual embeddings to quantify intertextuality at the corpus level. It introduces a cosine-similarity–based metric, computed against ground-truth cross-references and assessed through bootstrapped uncertainty, to compare original, human-translated, and machine-translated texts across languages with varying resources. The study finds that human translations often amplify intertextuality relative to machine baselines, with English translations showing the strongest preservation and Marathi the weakest, while machine translations tend to provide a neutral baseline. These results demonstrate a practical framework for evaluating translation-driven shifts in literary devices and offer insights for translation studies and multilingual NLP in resource-diverse contexts.
Abstract
Rhetorical devices are difficult to translate, but they are crucial to the translation of literary documents. We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation. To do so, we use Biblical texts, which are both full of intertextual references and are highly translated works. We provide a metric to characterize intertextuality at the corpus level and provide a quantitative analysis of the preservation of this rhetorical device across extant human translations and machine-generated counterparts. We go on to provide qualitative analysis of cases wherein human translations over- or underemphasize the intertextuality present in the text, whereas machine translations provide a neutral baseline. This provides support for established scholarship proposing that human translators have a propensity to amplify certain literary characteristics of the original manuscripts.
