Table of Contents
Fetching ...

Code-Switching In-Context Learning for Cross-Lingual Transfer of Large Language Models

Haneul Yoo, Jiho Jin, Kyunghyun Cho, Alice Oh

TL;DR

CSICL introduces a training-free prompting strategy that gradually code-switches from a target language to English during inference to bridge latent cross-lingual representations in multilingual LLMs. By pairing progressive code-switching demonstrations with gradual translation instructions, CSICL improves cross-lingual alignment and reduces reliance on internal English translation. Extensive experiments across 4 multilingual LLMs, 6 datasets, and 10 languages show consistent gains, with the largest benefits in low-resource settings and unseen languages, particularly for translation and reasoning tasks. The work demonstrates code-switching as a principled, scalable inference-time tool to advance equitable multilingual performance in LLMs.

Abstract

While large language models (LLMs) exhibit strong multilingual abilities, their reliance on English as latent representations creates a translation barrier, where reasoning implicitly depends on internal translation into English. When this process fails, performance in non-English languages deteriorates sharply, limiting the inclusiveness of LLM-based applications. Existing cross-lingual in-context learning (X-ICL) methods primarily leverage monolingual demonstrations, often failing to mitigate this barrier and instead reinforcing it. In this work, we introduce code-switching in-context learning (CSICL), a simple yet effective prompting strategy that progressively transitions from a target language to English within demonstrations and instruction to facilitate their latent reasoning in English. By explicitly scaffolding the reasoning process through controlled code-switching, CSICL acts as an implicit linguistic bridge that enhances cross-lingual alignment and reduces reliance on the translation barrier. We conduct extensive experiments across 4 LLMs, 6 datasets, and 10 languages, spanning both knowledge-intensive and reasoning-oriented domains. Our results demonstrate that CSICL consistently outperforms X-ICL baselines, achieving gains of 3.1%p and 1.9%p in both target and unseen languages, respectively. The improvement is even more pronounced in low-resource settings, with gains of 14.7% in target and 5.3% in unseen languages. These findings establish code-switching as a principled and robust approach for overcoming the translation barrier during inference, moving LLMs toward more equitable and effective multilingual systems.

Code-Switching In-Context Learning for Cross-Lingual Transfer of Large Language Models

TL;DR

CSICL introduces a training-free prompting strategy that gradually code-switches from a target language to English during inference to bridge latent cross-lingual representations in multilingual LLMs. By pairing progressive code-switching demonstrations with gradual translation instructions, CSICL improves cross-lingual alignment and reduces reliance on internal English translation. Extensive experiments across 4 multilingual LLMs, 6 datasets, and 10 languages show consistent gains, with the largest benefits in low-resource settings and unseen languages, particularly for translation and reasoning tasks. The work demonstrates code-switching as a principled, scalable inference-time tool to advance equitable multilingual performance in LLMs.

Abstract

While large language models (LLMs) exhibit strong multilingual abilities, their reliance on English as latent representations creates a translation barrier, where reasoning implicitly depends on internal translation into English. When this process fails, performance in non-English languages deteriorates sharply, limiting the inclusiveness of LLM-based applications. Existing cross-lingual in-context learning (X-ICL) methods primarily leverage monolingual demonstrations, often failing to mitigate this barrier and instead reinforcing it. In this work, we introduce code-switching in-context learning (CSICL), a simple yet effective prompting strategy that progressively transitions from a target language to English within demonstrations and instruction to facilitate their latent reasoning in English. By explicitly scaffolding the reasoning process through controlled code-switching, CSICL acts as an implicit linguistic bridge that enhances cross-lingual alignment and reduces reliance on the translation barrier. We conduct extensive experiments across 4 LLMs, 6 datasets, and 10 languages, spanning both knowledge-intensive and reasoning-oriented domains. Our results demonstrate that CSICL consistently outperforms X-ICL baselines, achieving gains of 3.1%p and 1.9%p in both target and unseen languages, respectively. The improvement is even more pronounced in low-resource settings, with gains of 14.7% in target and 5.3% in unseen languages. These findings establish code-switching as a principled and robust approach for overcoming the translation barrier during inference, moving LLMs toward more equitable and effective multilingual systems.

Paper Structure

This paper contains 40 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overview of CSICL. We employ 1) gradual code-switching few-shot demonstrations and 2) gradual translation instruction to help the latent process of LLMs for non-English inputs and bypass translation barrier.
  • Figure 2: Two-step pipeline to generate gradual code-switching few-shot demonstrations in CSICL. We first instruct LLM to convert parallel sentences into code-switching and then generate gradual code-switching following the MLF model.
  • Figure 3: Experimental results of X-ICL approaches in target languages using four different models. Tgt. and Rnd. denote a target language and a random language, respectively. Asterisk indicates statistical significance over all baselines.
  • Figure 4: Experimental results of X-ICL approaches for each subject category on Global MMLU. Tgt. and Rnd. denote a target language and a random language, respectively. Asterisk indicates statistical significance over all baselines.
  • Figure 5: Performance differences (%p) of CSICL and X-ICL baselines compared to zero-shot learning setting per a target language. Tgt. and Rnd. denote a target and a random language. Asterisk indicates statistical significance over all baselines.