Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Lynn Chua; Badih Ghazi; Yangsibo Huang; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; Amer Sinha; Chulin Xie; Chiyuan Zhang

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang

TL;DR

The paper investigates whether multilingual LLMs can transfer knowledge across languages and finds a notable crosslingual knowledge barrier: models perform well on explicit crosslingual tasks like translation but struggle to apply learned knowledge when the language context changes. The authors demonstrate this via general knowledge (MMLU) and domain-specific evaluations (Harry Potter and TOFU), revealing substantial gaps in crosslingual QA. They show that inference-time mitigation offers limited relief, while mixed-language fine-tuning on general and domain-specific corpora significantly reduces the barrier, improves crosslingual QA, and benefits out-of-distribution languages. These findings underscore the need for explicit optimization to unlock full crosslingual potential, with practical implications for multilingual AI assistants and cross-language knowledge retrieval; the authors also provide public code to support further research.

Abstract

Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, i.e., be crosslingual? This study evaluates state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation and embedding space analyses, they struggle with deeper crosslingual knowledge transfer, revealing a crosslingual knowledge barrier in both general (MMLU benchmark) and domain-specific (Harry Potter quiz and TOFU benchmark) contexts. Since simple inference-time mitigation methods offer only limited improvement, we propose fine-tuning of LLMs on mixed-language data, which effectively reduces these gaps, even when using out-of-domain datasets like WikiText. Our findings suggest the need for explicit optimization to unlock the full crosslingual potential of LLMs. Our code is publicly available at https://github.com/google-research/crosslingual-knowledge-barriers.

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

TL;DR

Abstract

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (18)