Table of Contents
Fetching ...

Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

Simone Conia, Daniel Lee, Min Li, Umar Farooq Minhas, Saloni Potdar, Yunyao Li

TL;DR

This paper introduces XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and proposes KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism.

Abstract

Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129% and 62% relative improvement compared to NLLB-200 and GPT-4, respectively.

Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

TL;DR

This paper introduces XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and proposes KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism.

Abstract

Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129% and 62% relative improvement compared to NLLB-200 and GPT-4, respectively.

Paper Structure

This paper contains 43 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of KG-MT, which leverages a knowledge retriever, i.e., a dense retrieval mechanism to retrieve the most relevant entities from a multilingual knowledge graph (see Section \ref{['subsec:retriever']}), to improve the translation. The retrieved entities are then integrated into the MT system in two ways: explicit knowledge integration, where the entity names are explicitly added to the source text (see Section \ref{['subsec:generator']}), and implicit knowledge integration, where the entity embeddings are fused with the encoder hidden states (see Section \ref{['subsec:fusion']}).
  • Figure 2: Results of KG-MT when using explicit or implicit knowledge integration, or both.
  • Figure 3: Results of KG-MT when using gold knowledge instead of the knowledge from the retriever.
  • Figure 4: UI used for the annotation task: the annotators could familiarize themselves with the task with an outline of the task instructions (detailed guidelines could be read in a separate page) and the information about the entity, including its names in English and its Wikipedia pages in English and the target language (Italian in this case).
  • Figure 5: UI used for the annotation task: the annotator was tasked with providing the translation from the English question to the target language in a free-form text box, and was provided relevant details such as the (i) English question, (2) English entity, (3) entity names in the target language, and (4) a possible translation template in the target language.
  • ...and 2 more figures