The Impact of Language Mixing on Bilingual LLM Reasoning

Yihao Li; Jiayi Xin; Miranda Muqing Miao; Qi Long; Lyle Ungar

The Impact of Language Mixing on Bilingual LLM Reasoning

Yihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long, Lyle Ungar

TL;DR

It is shown that language mixing can enhance reasoning: enforcing monolingual decoding reduces accuracy by 5.6 percentage points on MATH500, and a lightweight probe can be trained to predict whether a potential language switch would benefit or harm reasoning, and when used to guide decoding, increases accuracy.

Abstract

Proficient multilingual speakers often intentionally switch languages in the middle of a conversation. Similarly, recent reasoning-focused bilingual large language models (LLMs) with strong capabilities in both languages exhibit language mixing-alternating languages within their chain of thought. Discouraging this behavior in DeepSeek-R1 was found to degrade accuracy, suggesting that language mixing may benefit reasoning. In this work, we study language switching in Chinese-English bilingual reasoning models. We identify reinforcement learning with verifiable rewards (RLVR) as the critical training stage that leads to language mixing. We show that language mixing can enhance reasoning: enforcing monolingual decoding reduces accuracy by 5.6 percentage points on MATH500. Additionally, a lightweight probe can be trained to predict whether a potential language switch would benefit or harm reasoning, and when used to guide decoding, increases accuracy by 2.92 percentage points. Our findings suggest that language mixing is not merely a byproduct of multilingual training, but is a strategic reasoning behavior.

The Impact of Language Mixing on Bilingual LLM Reasoning

TL;DR

Abstract

The Impact of Language Mixing on Bilingual LLM Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)