Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
Himanshu Beniwal, Sailesh Panda, Birudugadda Srivibhav, Mayank Singh
TL;DR
The paper investigates cross-lingual backdoor attacks in multilingual LLMs (X-BAT), showing that backdoors injected in one language can transfer to others via shared embedding spaces and affect toxicity classification. It employs a large-scale multilingual toxicity setup across six languages, three diverse models, and multiple triggers and poisoning budgets, using ASR and CACC as evaluation metrics. Key findings show that transfer strength depends on model architecture and language distribution, with trigger representations aligning across languages and backdoors evading standard information-flow detection. The work highlights a practical security risk in multilingual deployments and motivates developing robust defenses and detection methods for X-BAT in real-world systems.
Abstract
We explore \textbf{C}ross-lingual \textbf{B}ackdoor \textbf{AT}tacks (X-BAT) in multilingual Large Language Models (mLLMs), revealing how backdoors inserted in one language can automatically transfer to others through shared embedding spaces. Using toxicity classification as a case study, we demonstrate that attackers can compromise multilingual systems by poisoning data in a single language, with rare and high-occurring tokens serving as specific, effective triggers. Our findings expose a critical vulnerability that influences the model's architecture, resulting in a concealed backdoor effect during the information flow. Our code and data are publicly available https://github.com/himanshubeniwal/X-BAT.
