Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang, Yunlong Liang, Zengkui Sun, Yuxuan Cao, Jiarong Xu, Fandong Meng
TL;DR
The paper investigates cross-lingual knowledge editing in multi-lingual LLMs by constructing Bi-ZsRE through English→Chinese translations of ZsRE and evaluating editing methods across English and Chinese, revealing that cross-language transfer of edited knowledge remains difficult. It introduces a formal objective for cross-lingual editing and assesses reliability, generality, locality, and portability across multiple backbones and baselines, highlighting language-modeling gaps and limited portability. The work contributes Bi-ZsRE as a public dataset, systematic cross-lingual evaluations, and analyses of inconsistent behaviors and challenges, offering insights into how editing in one language interacts with performance in another. The findings have practical implications for deploying multilingual LLMs in diverse language settings and motivate future work on methods that truly generalize edited knowledge across languages.
Abstract
Knowledge editing aims to change language models' performance on several special cases (i.e., editing scope) by infusing the corresponding expected knowledge into them. With the recent advancements in large language models (LLMs), knowledge editing has been shown as a promising technique to adapt LLMs to new knowledge without retraining from scratch. However, most of the previous studies neglect the multi-lingual nature of some main-stream LLMs (e.g., LLaMA, ChatGPT and GPT-4), and typically focus on monolingual scenarios, where LLMs are edited and evaluated in the same language. As a result, it is still unknown the effect of source language editing on a different target language. In this paper, we aim to figure out this cross-lingual effect in knowledge editing. Specifically, we first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese. Then, we conduct English editing on various knowledge editing methods covering different paradigms, and evaluate their performance in Chinese, and vice versa. To give deeper analyses of the cross-lingual effect, the evaluation includes four aspects, i.e., reliability, generality, locality and portability. Furthermore, we analyze the inconsistent behaviors of the edited models and discuss their specific challenges. Data and codes are available at https://github.com/krystalan/Bi_ZsRE
