Multilingual Large Language Models Are Not (Yet) Code-Switchers
Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Indra Winata, Alham Fikri Aji
TL;DR
This study analyzes how well multilingual LLMs handle code-switching by evaluating four CSW tasks (sentiment analysis, machine translation, summarization, and word-level LID) under zero-shot, few-shot, and fine-tuning regimes. It finds that while prompting and scaling yield some gains, fine-tuned smaller models consistently outperform the largest multilingual LLMs, with ChatGPT showing competitive performance but limited transparency. The authors argue that current multilingual LLMs do not inherently master code-switching, and they propose data-centric and objective-driven directions (e.g., CSW-focused data representation, token-level objectives) to bridge this gap. The work emphasizes the need for inclusive language technologies that reflect real-world code-switching and offers practical guidance for future model development and evaluation. Overall, the paper provides a rigorous, task-diverse benchmark and clear implications for advancing true polyglot CSW capabilities in NLP systems.
Abstract
Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods. While there have been extensive studies on their abilities in monolingual tasks, the investigation of their potential in the context of code-switching (CSW), the practice of alternating languages within an utterance, remains relatively uncharted. In this paper, we provide a comprehensive empirical analysis of various multilingual LLMs, benchmarking their performance across four tasks: sentiment analysis, machine translation, summarization and word-level language identification. Our results indicate that despite multilingual LLMs exhibiting promising outcomes in certain tasks using zero or few-shot prompting, they still underperform in comparison to fine-tuned models of much smaller scales. We argue that current "multilingualism" in LLMs does not inherently imply proficiency with code-switching texts, calling for future research to bridge this discrepancy.
