Optimizing Language Models for Crosslingual Knowledge Consistency

Tianyu Liu; Jirui Qi; Mrinmaya Sachan; Ryan Cotterell; Raquel Fernández; Arianna Bisazza

Optimizing Language Models for Crosslingual Knowledge Consistency

Tianyu Liu, Jirui Qi, Mrinmaya Sachan, Ryan Cotterell, Raquel Fernández, Arianna Bisazza

TL;DR

Direct Consistency Optimization (DCO), a DPO-inspired method that requires no explicit reward model and is derived directly from the LLM itself is introduced, a robust and efficient solution for improving knowledge consistency across languages in multilingual LLMs.

Abstract

Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingual responses. We introduce Direct Consistency Optimization (DCO), a DPO-inspired method that requires no explicit reward model and is derived directly from the LLM itself. Comprehensive experiments show that DCO significantly improves crosslingual consistency across diverse LLMs and outperforms existing methods when training with samples of multiple languages, while complementing DPO when gold labels are available. Extra experiments demonstrate the effectiveness of DCO in bilingual settings, significant out-of-domain generalizability, and controllable alignment via direction hyperparameters. Taken together, these results establish DCO as a robust and efficient solution for improving knowledge consistency across languages in multilingual LLMs. All code, training scripts, and evaluation benchmarks are released at https://github.com/Betswish/ConsistencyRL.

Optimizing Language Models for Crosslingual Knowledge Consistency

TL;DR

Abstract

Paper Structure (55 sections, 5 theorems, 20 equations, 8 figures, 16 tables)

This paper contains 55 sections, 5 theorems, 20 equations, 8 figures, 16 tables.

Introduction
Related Work
Measuring crosslingual consistency.
Improving crosslingual consistency.
Preliminaries
Reinforcement Learning from Human Feedback.
Direct Preference Optimization.
Optimizing Crosslingual Consistency
Defining Crosslingual Consistency
Solving the Constrained RL Problem
What does ${\color{MacroColor} r_{\textsc{align}}}$ do?
Choosing ${\color{MacroColor}\gamma}_1, {\color{MacroColor}\gamma}_2$ and $\beta$.
Generalizing to $N$ languages.
Direct Consistency Optimization
The Objective Function.
...and 40 more sections

Key Result

Lemma 1

If ${\color{MacroColor}\gamma}_1{\color{MacroColor}\gamma}_2 = \beta^2$, the optimal policy ${\color{MacroColor} \pi^{\star}}$ defined by eq:aligned-policy is consistent across ${\color{MacroColor} L}_1$ and ${\color{MacroColor} L}_2$.

Figures (8)

Figure 1: Illustration of DCO, which promotes crosslingual consistency by aligning the completion likelihoods across parallel prompts. Without alignment, the distributions over candidate answers lead to inconsistent preferences between answer pairs. After alignment, the distributions yield two consistent lists that preserve the ranking of answers in both languages.
Figure 2: Left: Answer accuracy after performing DCO on English-Swahili. Right: Proportion of questions for which the LLM's response changes after DCO, with CLC values marked in green.
Figure 3: Left: Answer accuracy after performing DCO on English-Yoruba. Right: Proportion of questions for which the LLM's response changes after DCO, with CLC values marked in green.
Figure 4: The changes in CLC of Qwen2.5-14B after DCO. Left: CLC between all language pairs on the original model. Right: Improvements in CLC of the post-DCO model.
Figure 5: The changes in CLC of Gemma3-12B after DCO. Left: CLC between all language pairs on the original model. Right: The Improvements in CLC of the post-DCO model.
...and 3 more figures

Theorems & Definitions (11)

Definition 1
Lemma 1
proof : Proof sketch
Remark 1
Lemma 2
Lemma 2
proof
Lemma 2
proof : Proof
Lemma 3
...and 1 more

Optimizing Language Models for Crosslingual Knowledge Consistency

TL;DR

Abstract

Optimizing Language Models for Crosslingual Knowledge Consistency

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (11)