Table of Contents
Fetching ...

Semantic Label Drift in Cross-Cultural Translation

Mohsinul Kabir, Tasnim Ahmed, Md Mezbaur Rahman, Polydoros Giannouris, Sophia Ananiadou

TL;DR

The paper investigates semantic label drift in cross-cultural translation, focusing on how cultural alignment between source and target languages affects label fidelity in MT. It systematically compares SMT and modern LLM-based translation across culturally sensitive domains (mental health and irony) using Bengali and Greek as target languages, and employs six models with literal and anthropological prompting, plus LLM-based annotation validated by native speakers. The study finds that label drift is domain-sensitive, with mild/moderate and context-dependent irony most susceptible, while extreme labels are better preserved; cultural prompting can sometimes worsen label fidelity, and cultural similarity between languages mitigates drift in certain domains. These results underscore the importance of culturally validating translated datasets before downstream NLP use, to avoid misinterpretation and potential cultural conflict in applications.

Abstract

Machine Translation (MT) is widely employed to address resource scarcity in low-resource languages by generating synthetic data from high-resource counterparts. While sentiment preservation in translation has long been studied, a critical but underexplored factor is the role of cultural alignment between source and target languages. In this paper, we hypothesize that semantic labels are drifted or altered during MT due to cultural divergence. Through a series of experiments across culturally sensitive and neutral domains, we establish three key findings: (1) MT systems, including modern Large Language Models (LLMs), induce label drift during translation, particularly in culturally sensitive domains; (2) unlike earlier statistical MT tools, LLMs encode cultural knowledge, and leveraging this knowledge can amplify label drift; and (3) cultural similarity or dissimilarity between source and target languages is a crucial determinant of label preservation. Our findings highlight that neglecting cultural factors in MT not only undermines label fidelity but also risks misinterpretation and cultural conflict in downstream applications.

Semantic Label Drift in Cross-Cultural Translation

TL;DR

The paper investigates semantic label drift in cross-cultural translation, focusing on how cultural alignment between source and target languages affects label fidelity in MT. It systematically compares SMT and modern LLM-based translation across culturally sensitive domains (mental health and irony) using Bengali and Greek as target languages, and employs six models with literal and anthropological prompting, plus LLM-based annotation validated by native speakers. The study finds that label drift is domain-sensitive, with mild/moderate and context-dependent irony most susceptible, while extreme labels are better preserved; cultural prompting can sometimes worsen label fidelity, and cultural similarity between languages mitigates drift in certain domains. These results underscore the importance of culturally validating translated datasets before downstream NLP use, to avoid misinterpretation and potential cultural conflict in applications.

Abstract

Machine Translation (MT) is widely employed to address resource scarcity in low-resource languages by generating synthetic data from high-resource counterparts. While sentiment preservation in translation has long been studied, a critical but underexplored factor is the role of cultural alignment between source and target languages. In this paper, we hypothesize that semantic labels are drifted or altered during MT due to cultural divergence. Through a series of experiments across culturally sensitive and neutral domains, we establish three key findings: (1) MT systems, including modern Large Language Models (LLMs), induce label drift during translation, particularly in culturally sensitive domains; (2) unlike earlier statistical MT tools, LLMs encode cultural knowledge, and leveraging this knowledge can amplify label drift; and (3) cultural similarity or dissimilarity between source and target languages is a crucial determinant of label preservation. Our findings highlight that neglecting cultural factors in MT not only undermines label fidelity but also risks misinterpretation and cultural conflict in downstream applications.

Paper Structure

This paper contains 13 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Evaluating label shifts in cross-cultural translation. We select two culturally sensitive datasets constructed in Western contexts, translate subsets of randomly sampled data into Bengali and Greek, and annotate them with verification by native speakers. By comparing translated annotations with the original labels, we assess the extent of label preservation during translation.
  • Figure 2: Translation with Anthropological Prompting.
  • Figure 3: Matthews Correlation Coefficient (MCC) scores comparing the performance of six translation/language models. The grouped bar charts display literal prompting (L) results (darker bars) for all models, while cultural prompting (C) results (lighter bars) are shown only for the four LLM-based models. MCC scores interpret as: $0.0-0.3:$weak, $0.31-0.5:$ weak moderate, $0.51-0.7:$ moderate strong, and $>0.7:$ strong performance.
  • Figure 4: Kullback–Leibler (KL) divergence between original and translated label distributions across six MT models under different prompting techniques. The SemEval-2018 T3 dataset exhibits greater divergence from the original labels than DEPTWEET.
  • Figure 5: Agreement scores for automatic (solid) and human (dashed) annotations, ranging from fair to substantial agreement across both datasets.
  • ...and 1 more figures