Table of Contents
Fetching ...

Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly

Changjiang Gao, Hongda Hu, Peng Hu, Jiajun Chen, Jixing Li, Shujian Huang

TL;DR

This paper introduces CLiKA, a framework that quantifies cross-lingual knowledge alignment at three levels: Performance (PF), Consistency (CT), and Conductivity (CD). By constructing Basic, Factual, and Fictional testing streams across ten languages, it evaluates the effects of multilingual pretraining and multilingual instruction tuning on cross-lingual alignment, using metrics RA, en-CO, and XRR. The results show that while mixed multilingual pretraining and multilingual finetuning boost basic language ability and PF/CT alignment, cross-lingual conductivity remains weak and is not meaningfully improved by these strategies, suggesting that high cross-lingual consistency may stem from overlapping training data rather than true knowledge transfer. Supplementary German studies reinforce the limited conductivity gains, indicating that current methods are insufficient for robust cross-lingual knowledge conduction. The work highlights the need for novel training approaches to achieve deeper cross-lingual alignment with practical impact across languages.

Abstract

Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we propose CLiKA, a systematic framework to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels, and explored the effect of multilingual pretraining and instruction tuning on the degree of alignment. Results show that: while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed. Namely, continued pretraining improves the alignment of the target language at the cost of other languages, while mixed pretraining affect other languages less. Also, the overall cross-lingual knowledge alignment, especially in the conductivity level, is unsatisfactory for all tested LLMs, and neither multilingual pretraining nor instruction tuning can substantially improve the cross-lingual knowledge conductivity.

Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly

TL;DR

This paper introduces CLiKA, a framework that quantifies cross-lingual knowledge alignment at three levels: Performance (PF), Consistency (CT), and Conductivity (CD). By constructing Basic, Factual, and Fictional testing streams across ten languages, it evaluates the effects of multilingual pretraining and multilingual instruction tuning on cross-lingual alignment, using metrics RA, en-CO, and XRR. The results show that while mixed multilingual pretraining and multilingual finetuning boost basic language ability and PF/CT alignment, cross-lingual conductivity remains weak and is not meaningfully improved by these strategies, suggesting that high cross-lingual consistency may stem from overlapping training data rather than true knowledge transfer. Supplementary German studies reinforce the limited conductivity gains, indicating that current methods are insufficient for robust cross-lingual knowledge conduction. The work highlights the need for novel training approaches to achieve deeper cross-lingual alignment with practical impact across languages.

Abstract

Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we propose CLiKA, a systematic framework to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels, and explored the effect of multilingual pretraining and instruction tuning on the degree of alignment. Results show that: while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed. Namely, continued pretraining improves the alignment of the target language at the cost of other languages, while mixed pretraining affect other languages less. Also, the overall cross-lingual knowledge alignment, especially in the conductivity level, is unsatisfactory for all tested LLMs, and neither multilingual pretraining nor instruction tuning can substantially improve the cross-lingual knowledge conductivity.
Paper Structure (53 sections, 3 equations, 5 figures, 24 tables)

This paper contains 53 sections, 3 equations, 5 figures, 24 tables.

Figures (5)

  • Figure 1: Results of the general cross-lingual knowledge alignment evaluation. The outer circle of the radar graphs is 1.0 and the center is 0.0, and each circle represents a 0.2 span. \ref{['fig:basic-ra']} The RA scores on the Basic knowledge (the mean of xCSQA and xCOPA scores; \ref{['fig:realistic-ra']} The RA scores on the Factual knowledge (the mean of xGeo and xPeo scores); \ref{['fig:realistic-enco']} The en-CO scores on the Factual knowledge (the mean of xGeo and xPeo scores).
  • Figure 2: Example prompt for testing models on all our datasets.
  • Figure 3: Question templates for the xGeo part of the Factual dataset.
  • Figure 4: Question templates for the xPeo part of the Factual dataset.
  • Figure 5: Examples of the continents and places in the Fictional data.