Table of Contents
Fetching ...

Is continuous CoT better suited for multi-lingual reasoning?

Ali Hamza Bashir, Behzad Shomali, Markus Frey, Mehdi Ali, Rafet Sifa, David Berghaus

TL;DR

Findings indicate that continuous latent representations naturally exhibit greater language invariance, offering a scalable solution for cross-lingual reasoning.

Abstract

We investigate whether performing reasoning in a continuous latent space leads to more robust multilingual capabilities. We compare Continuous Chain-of-Thought (using the CODI framework) against standard supervised fine-tuning across five typologically diverse languages: English, Chinese, German, French, and Urdu. Our experiments on GSM8k and CommonsenseQA demonstrate that continuous reasoning significantly outperforms explicit reasoning on low-resource languages, particularly in zero-shot settings where the target language was not seen during training. Additionally, this approach achieves extreme efficiency, compressing reasoning traces by approximately $29\times$ to $50\times$. These findings indicate that continuous latent representations naturally exhibit greater language invariance, offering a scalable solution for cross-lingual reasoning.

Is continuous CoT better suited for multi-lingual reasoning?

TL;DR

Findings indicate that continuous latent representations naturally exhibit greater language invariance, offering a scalable solution for cross-lingual reasoning.

Abstract

We investigate whether performing reasoning in a continuous latent space leads to more robust multilingual capabilities. We compare Continuous Chain-of-Thought (using the CODI framework) against standard supervised fine-tuning across five typologically diverse languages: English, Chinese, German, French, and Urdu. Our experiments on GSM8k and CommonsenseQA demonstrate that continuous reasoning significantly outperforms explicit reasoning on low-resource languages, particularly in zero-shot settings where the target language was not seen during training. Additionally, this approach achieves extreme efficiency, compressing reasoning traces by approximately to . These findings indicate that continuous latent representations naturally exhibit greater language invariance, offering a scalable solution for cross-lingual reasoning.
Paper Structure (26 sections, 4 equations, 2 figures, 11 tables)

This paper contains 26 sections, 4 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Performance of LLaMA3.2-1B-Instruct trained on multi-lingual GSM8k-Aug-NL data. Both models perform similarly, with CoT-SFT performing better for high-resource languages and CODI performing better for low-resource languages. Moreover, CODI performs significantly better than CoT-SFT on Urdu when it is Out-of-distribution (OOD) (i.e., when Urdu was not part of the fine-tuning data).
  • Figure 3: Performance on a low-resource language (Urdu). The latent reasoning approach works significantly better than CoT-SFT in all cases.