Table of Contents
Fetching ...

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages

Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik

TL;DR

This work tackles idiomatic translation across Indic languages, where cultural nuance and one-to-many mappings hinder accurate rendering. It introduces IdiomCE, an inductive graph neural network that uses cultural elements as features to model mappings between English idioms and Indic idioms, generalizing to unseen items and supporting pivot-based cross-language translation. The approach includes a data-creation pipeline for cultural features, an inductive GNN with link prediction, a node-duplication strategy to alleviate cold-start, and inter-Indic translation via English as a pivot. Empirical results show notable improvements over static knowledge-graph and prompting baselines across multiple language pairs, including human evaluations that corroborate the quality of idiomatic translations, with particular gains for smaller models in resource-constrained settings.

Abstract

Translating multi-word expressions (MWEs) and idioms requires a deep understanding of the cultural nuances of both the source and target languages. This challenge is further amplified by the one-to-many nature of idiomatic translations, where a single source idiom can have multiple target-language equivalents depending on cultural references and contextual variations. Traditional static knowledge graphs (KGs) and prompt-based approaches struggle to capture these complex relationships, often leading to suboptimal translations. To address this, we propose IdiomCE, an adaptive graph neural network (GNN) based methodology that learns intricate mappings between idiomatic expressions, effectively generalizing to both seen and unseen nodes during training. Our proposed method enhances translation quality even in resource-constrained settings, facilitating improved idiomatic translation in smaller models. We evaluate our approach on multiple idiomatic translation datasets using reference-less metrics, demonstrating significant improvements in translating idioms from English to various Indian languages.

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages

TL;DR

This work tackles idiomatic translation across Indic languages, where cultural nuance and one-to-many mappings hinder accurate rendering. It introduces IdiomCE, an inductive graph neural network that uses cultural elements as features to model mappings between English idioms and Indic idioms, generalizing to unseen items and supporting pivot-based cross-language translation. The approach includes a data-creation pipeline for cultural features, an inductive GNN with link prediction, a node-duplication strategy to alleviate cold-start, and inter-Indic translation via English as a pivot. Empirical results show notable improvements over static knowledge-graph and prompting baselines across multiple language pairs, including human evaluations that corroborate the quality of idiomatic translations, with particular gains for smaller models in resource-constrained settings.

Abstract

Translating multi-word expressions (MWEs) and idioms requires a deep understanding of the cultural nuances of both the source and target languages. This challenge is further amplified by the one-to-many nature of idiomatic translations, where a single source idiom can have multiple target-language equivalents depending on cultural references and contextual variations. Traditional static knowledge graphs (KGs) and prompt-based approaches struggle to capture these complex relationships, often leading to suboptimal translations. To address this, we propose IdiomCE, an adaptive graph neural network (GNN) based methodology that learns intricate mappings between idiomatic expressions, effectively generalizing to both seen and unseen nodes during training. Our proposed method enhances translation quality even in resource-constrained settings, facilitating improved idiomatic translation in smaller models. We evaluate our approach on multiple idiomatic translation datasets using reference-less metrics, demonstrating significant improvements in translating idioms from English to various Indian languages.

Paper Structure

This paper contains 26 sections, 6 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: An example of cultural enhanced graph with different cultural elements: Concepts, Values, Context (Historical/Situational) and how we can create relationship among source and target nodes using their cultural elements.
  • Figure 2: Overall training process of IdiomCE: (a) GNN training – illustrating the creation of a Knowledge Graph using source and target idioms, specifically for en-hi, leveraging LaBSE embeddings and training a GNN for the Link Prediction (LP) task; (b) Node Duplication – demonstrating how we address the cold start issue by duplicating target nodes; and (c) Contrastive Training – showing the training through positive and negative samples and the process of mapping unseen nodes to relevant target idioms.
  • Figure 3: Inference strategy: (a) Unseen & Seen Node Translation – a BERT-trained GNN adapts to unseen nodes, with the Selection and Translation (ST) block selecting idioms via an LLM before translation; (b) Inter-Indic Translation – using English as a pivot between $xx_1$ and $xx_2$.
  • Figure 4: Performance comparison on average LLM score of Models on seen nodes (idiom) (a) and unseen nodes (b) across en-xx direction.
  • Figure :
  • ...and 4 more figures