CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering
Yumeng Wang, Zhiyuan Fan, Qingyun Wang, May Fung, Heng Ji
TL;DR
CALM addresses cross-lingual gaps in culture-independent knowledge by introducing a self-consistent, language-agnostic alignment framework. It samples multilingual responses, identifies the most self-consistent answer as the positive target, and trains with Direct Preference Optimization using multilingual preference pairs, optionally integrating Self-RAG for retrieval-augmented knowledge. Empirical results on MEDQA and X-CSQA show consistent accuracy and multilingual consistency gains, especially as more languages participate, and demonstrate cross-dataset and cross-lingual generalization beyond monolingual baselines. The approach highlights that even noisy but self-consistent signals can drive knowledge alignment across languages, offering a scalable path to robust multilingual QA. Overall, CALM advances the ability of LLMs to provide coherent, cross-lingual knowledge across diverse languages with practical implications for multilingual AI systems.
Abstract
Large Language Models (LLMs) are pretrained on extensive multilingual corpora to acquire both language-specific cultural knowledge and general knowledge. Ideally, while LLMs should provide consistent responses to culture-independent questions across languages, we observe significant performance disparities. To address this, we explore the Cross-Lingual Self-Aligning ability of Language Models (CALM) to align knowledge across languages. Specifically, for a given question, we sample multiple responses across different languages and select the most self-consistent response as the target, leaving the remaining responses as negative examples. We then employ direct preference optimization (DPO) to align the model's knowledge across different languages. Evaluations on the MEDQA and X-CSQA datasets demonstrate CALM's effectiveness in enhancing cross-lingual knowledge question answering, both in zero-shot and retrieval-augmented settings. We also found that increasing the number of languages involved in CALM training leads to higher accuracy and consistency. We offer a qualitative analysis of how cross-lingual consistency can enhance knowledge alignment and explore the method's generalizability.
