Table of Contents
Fetching ...

On the Entity-Level Alignment in Crosslingual Consistency

Yihong Liu, Mingyang Wang, François Yvon, Hinrich Schütze

TL;DR

This work investigates why multilingual LLMs often misrecall facts across languages by proposing that entity-level alignment in a shared conceptual space underpins crosslingual consistency. It introduces an entity translation probing task and the KLAR dataset to quantify alignment and recall, showing strong correlations between subject/object alignment and crosslingual recalls. To address misalignment, two prompting strategies, SubSub and SubInj, inject or substitute English subject information, yielding substantial gains in recall and crosslingual consistency across model families, especially for smaller or English-centric models. Mechanistic analysis via Logit Lens indicates these prompts reinforce language-agnostic entity representations in pivot-language space, clarifying why these simple prompts improve multilingual factual prediction and suggesting a practical path to more reliable crosslingual knowledge.

Abstract

Multilingual large language models (LLMs) are expected to recall factual knowledge consistently across languages. However, the factors that give rise to such crosslingual consistency -- and its frequent failure -- remain poorly understood. In this work, we hypothesize that these inconsistencies may arise from failures in entity alignment, the process of mapping subject and object entities into a shared conceptual space across languages. To test this, we assess alignment through entity-level (subject and object) translation tasks, and find that consistency is strongly correlated with alignment across all studied models, with misalignment of subjects or objects frequently resulting in inconsistencies. Building on this insight, we propose SubSub and SubInj, two effective methods that integrate English translations of subjects into prompts across languages, leading to substantial gains in both factual recall accuracy and consistency. Finally, our mechanistic analysis reveals that these interventions reinforce the entity representation alignment in the conceptual space through model's internal pivot-language processing, offering effective and practical strategies for improving multilingual factual prediction.

On the Entity-Level Alignment in Crosslingual Consistency

TL;DR

This work investigates why multilingual LLMs often misrecall facts across languages by proposing that entity-level alignment in a shared conceptual space underpins crosslingual consistency. It introduces an entity translation probing task and the KLAR dataset to quantify alignment and recall, showing strong correlations between subject/object alignment and crosslingual recalls. To address misalignment, two prompting strategies, SubSub and SubInj, inject or substitute English subject information, yielding substantial gains in recall and crosslingual consistency across model families, especially for smaller or English-centric models. Mechanistic analysis via Logit Lens indicates these prompts reinforce language-agnostic entity representations in pivot-language space, clarifying why these simple prompts improve multilingual factual prediction and suggesting a practical path to more reliable crosslingual knowledge.

Abstract

Multilingual large language models (LLMs) are expected to recall factual knowledge consistently across languages. However, the factors that give rise to such crosslingual consistency -- and its frequent failure -- remain poorly understood. In this work, we hypothesize that these inconsistencies may arise from failures in entity alignment, the process of mapping subject and object entities into a shared conceptual space across languages. To test this, we assess alignment through entity-level (subject and object) translation tasks, and find that consistency is strongly correlated with alignment across all studied models, with misalignment of subjects or objects frequently resulting in inconsistencies. Building on this insight, we propose SubSub and SubInj, two effective methods that integrate English translations of subjects into prompts across languages, leading to substantial gains in both factual recall accuracy and consistency. Finally, our mechanistic analysis reveals that these interventions reinforce the entity representation alignment in the conceptual space through model's internal pivot-language processing, offering effective and practical strategies for improving multilingual factual prediction.

Paper Structure

This paper contains 19 sections, 9 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Analogy between consistent factual recall and entity translation. Both tasks may require mapping language-specific inputs into a shared conceptual space and projecting language-agnostic representations back into surface forms. This motivates our central hypothesis: entity alignment is important and facilitates consistent factual recall across languages.
  • Figure 2: Correlation between entity-level alignment and crosslingual consistency. Each subplot displays the relationship between one alignment metric, i.e., $\mathrm{Align}^{\mathrm{sub}}(l_1, l_2)$, $\mathrm{Align}^{\mathrm{obj}}(l_1, l_2)$, or $\mathrm{Align}^{\mathrm{both}}(l_1, l_2)$, and crosslingual consistency $\mathrm{CO}(l_1, l_2)$ for a given model. Each point represents a language pair. The gray dashed line (the diagonal $y=x$) separates the region into one half where consistency is higher (above the line) and one half where alignment is higher (below the line). Strong and statistically significant correlations are observed across models, supporting our hypothesis that alignment is highly associated with consistency.
  • Figure 3: The boundedness relationship between consistency $\mathrm{CO}({l_1, l_2})$ and object alignment score $\mathrm{Align}^{\text{obj}}(l_1, l_2)$. Each point indicates a language pair, while different colors indicate different models. $\mathrm{CO}({l_1, l_2})$ is almost always upper-bounded by $\mathrm{Align}^{\text{obj}}(l_1, l_2)$ except for OLMo (1B).
  • Figure 4: Entity alignment and consistency analysis across models. Most consistent facts are entity-aligned, and misalignment leads to inconsistency.
  • Figure 5: Radar plots comparing Baseline (Orig), SubSub, and SubInj factual recall performance (ACC) across language families and script groups. SubInj consistently improves recall across all categories, especially in English-centric models (i.e., OLMo model families) and in non-Latin scripts.
  • ...and 7 more figures