Table of Contents
Fetching ...

Rethinking Cross-lingual Alignment: Balancing Transfer and Cultural Erasure in Multilingual LLMs

HyoJung Han, Sweta Agrawal, Eleftheria Briakou

TL;DR

This work reframes cross-lingual alignment (CLA) as a trade-off between universal knowledge transfer and culturally localized responses. It introduces the transfer-localization plane to quantify gains in language-agnostic knowledge and losses in culturally-specific localization, and empirically shows that existing CLA methods improve transfer at the cost of localization across six languages. The authors reveal that transfer concentrates in middle model layers while localization resides in deeper layers, motivating layer-aware interventions. They propose Surgical Steering, a layer-disentangled, inference-time technique that applies activation steering at different layers to balance both objectives, achieving a better Pareto frontier and reducing English-centric bias, though a residual irrecoverable trade-off remains.

Abstract

Cross-lingual alignment (CLA) aims to align multilingual representations, enabling Large Language Models (LLMs) to seamlessly transfer knowledge across languages. While intuitive, we hypothesize, this pursuit of representational convergence can inadvertently cause "cultural erasure", the functional loss of providing culturally-situated responses that should diverge based on the query language. In this work, we systematically analyze this trade-off by introducing a holistic evaluation framework, the transfer-localization plane, which quantifies both desirable knowledge transfer and undesirable cultural erasure. Using this framework, we re-evaluate recent CLA approaches and find that they consistently improve factual transfer at the direct cost of cultural localization across all six languages studied. Our investigation into the internal representations of these models reveals a key insight: universal factual transfer and culturally-specific knowledge are optimally steerable at different model layers. Based on this finding, we propose Surgical Steering, a novel inference-time method that disentangles these two objectives. By applying targeted activation steering to distinct layers, our approach achieves a better balance between the two competing dimensions, effectively overcoming the limitations of current alignment techniques.

Rethinking Cross-lingual Alignment: Balancing Transfer and Cultural Erasure in Multilingual LLMs

TL;DR

This work reframes cross-lingual alignment (CLA) as a trade-off between universal knowledge transfer and culturally localized responses. It introduces the transfer-localization plane to quantify gains in language-agnostic knowledge and losses in culturally-specific localization, and empirically shows that existing CLA methods improve transfer at the cost of localization across six languages. The authors reveal that transfer concentrates in middle model layers while localization resides in deeper layers, motivating layer-aware interventions. They propose Surgical Steering, a layer-disentangled, inference-time technique that applies activation steering at different layers to balance both objectives, achieving a better Pareto frontier and reducing English-centric bias, though a residual irrecoverable trade-off remains.

Abstract

Cross-lingual alignment (CLA) aims to align multilingual representations, enabling Large Language Models (LLMs) to seamlessly transfer knowledge across languages. While intuitive, we hypothesize, this pursuit of representational convergence can inadvertently cause "cultural erasure", the functional loss of providing culturally-situated responses that should diverge based on the query language. In this work, we systematically analyze this trade-off by introducing a holistic evaluation framework, the transfer-localization plane, which quantifies both desirable knowledge transfer and undesirable cultural erasure. Using this framework, we re-evaluate recent CLA approaches and find that they consistently improve factual transfer at the direct cost of cultural localization across all six languages studied. Our investigation into the internal representations of these models reveals a key insight: universal factual transfer and culturally-specific knowledge are optimally steerable at different model layers. Based on this finding, we propose Surgical Steering, a novel inference-time method that disentangles these two objectives. By applying targeted activation steering to distinct layers, our approach achieves a better balance between the two competing dimensions, effectively overcoming the limitations of current alignment techniques.

Paper Structure

This paper contains 40 sections, 7 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Examples of intended convergence and desired divergence in outputs of multilingual llms. Universal questions (left) should result in a single, converged answer (knowledge transfer) regardless of the query languages, while culturally-specific questions (right) should result in divergent, localized answers (cultural localization) reflecting cultural context inferred from the input language.
  • Figure 2: Competing results of cla approaches on knowledge transfer and cultural localization. Improvements in cla come at a consistent cost of cultural localization across all languages.
  • Figure 3: pca projections of hidden representations across unaligned and cla methods. As cla methods are applied, languages cluster more tightly, signaling stronger convergence. Yet, convergence differs by the nature of the datasets: gmmlu merges starting in the middle layers, whereas blend maintains separation until later stages, persisting even after cla.
  • Figure 4: Layer-wise analysis of en- and loc-steering on mist for gmmlu and blend dev set (right) and perpendicularity between two kinds of vectors (left). Cultural localization is optimally located in deeper layers, where the en- and loc vectors are also most orthogonal to each other.
  • Figure 5: Left: Trade-offs between transfer and localization with steering methods. Both en-steering and loc-steering are applied to mist. sur-steering is applied on top of different post-training methods (circles), indicated by the same color or by the connecting gray dotted line. Right: Tracking English-bias of post-training cla methods and the impact of sur-steering on all approaches.
  • ...and 9 more figures