Table of Contents
Fetching ...

Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe

Kunat Pipatanakul, Pittawat Taveekitworachai, Potsawee Manakul, Kasima Tharnpipitchai

TL;DR

The work addresses the gap in reasoning capabilities for low-resource languages by merging a reasoning-focused $70B$ model with a Thai-specialized $70B$ model, using publicly available data and an affordable compute budget. It introduces a two-stage approach: representation alignment via supervised fine-tuning with bilingual Bespoke-Stratos data and an ability-aware merging strategy (Mergekit, dare_linear) that balances early-layer reasoning with later-layer language fidelity. Empirical results show the final Typhoon2-R1-70B model approaches DeepSeek R1 on reasoning while preserving Thai-language performance, delivering substantial overall gains and demonstrating transferability to another Thai-friendly backbone (Sealion). The findings support the practicality of data-driven, cross-language reasoning transfer for low-resource settings, while outlining limitations and avenues for broader multilingual extension and refinement of instruction-tuning strategies.

Abstract

This paper investigates data selection and model merging methodologies aimed at incorporating advanced reasoning capabilities such as those of DeepSeek R1 into language-specific large language models (LLMs), with a particular focus on the Thai LLM. Our goal is to enhance the reasoning capabilities of language-specific LLMs while maintaining their target language abilities. DeepSeek R1 excels in reasoning but primarily benefits high-resource languages such as English and Chinese. However, low-resource languages remain underserved due to the dominance of English-centric training data and model optimizations, which limit performance in these languages. This limitation results in unreliable code-switching and diminished effectiveness on tasks in low-resource languages. Meanwhile, local and regional LLM initiatives have attempted to bridge this gap by developing language-specific LLMs that focus on improving local linguistic fidelity. We demonstrate that, with only publicly available datasets and a computational budget of $120, it is possible to enhance the reasoning capabilities of language-specific LLMs to match the level of DeepSeek R1, without compromising their performance on target language tasks.

Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe

TL;DR

The work addresses the gap in reasoning capabilities for low-resource languages by merging a reasoning-focused model with a Thai-specialized model, using publicly available data and an affordable compute budget. It introduces a two-stage approach: representation alignment via supervised fine-tuning with bilingual Bespoke-Stratos data and an ability-aware merging strategy (Mergekit, dare_linear) that balances early-layer reasoning with later-layer language fidelity. Empirical results show the final Typhoon2-R1-70B model approaches DeepSeek R1 on reasoning while preserving Thai-language performance, delivering substantial overall gains and demonstrating transferability to another Thai-friendly backbone (Sealion). The findings support the practicality of data-driven, cross-language reasoning transfer for low-resource settings, while outlining limitations and avenues for broader multilingual extension and refinement of instruction-tuning strategies.

Abstract

This paper investigates data selection and model merging methodologies aimed at incorporating advanced reasoning capabilities such as those of DeepSeek R1 into language-specific large language models (LLMs), with a particular focus on the Thai LLM. Our goal is to enhance the reasoning capabilities of language-specific LLMs while maintaining their target language abilities. DeepSeek R1 excels in reasoning but primarily benefits high-resource languages such as English and Chinese. However, low-resource languages remain underserved due to the dominance of English-centric training data and model optimizations, which limit performance in these languages. This limitation results in unreliable code-switching and diminished effectiveness on tasks in low-resource languages. Meanwhile, local and regional LLM initiatives have attempted to bridge this gap by developing language-specific LLMs that focus on improving local linguistic fidelity. We demonstrate that, with only publicly available datasets and a computational budget of $120, it is possible to enhance the reasoning capabilities of language-specific LLMs to match the level of DeepSeek R1, without compromising their performance on target language tasks.

Paper Structure

This paper contains 25 sections, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Overview of our Typhoon2 R1 70B recipe
  • Figure 2: Example demonstrate code-switching / language accuracy problem in DeepSeek R1 70B Distill. - The question is "Which came first, the chicken or the egg?" - The model generated a final response, but it was unsatisfactory as it contained unnatural code-switching that not in Thai.
  • Figure 3: Example demonstrate code-switching / language accuracy problem in DeepSeek R1 70B Distill. - The question is "$\text{Convert the point } (0,3) \text{ in rectangular coordinates to polar coordinates.}$ Enter your answer in the form $(r,\theta), \quad \text{where} \quad r > 0, \quad 0 \leq \theta < 2\pi.$" - The model generated a final response, but it was entirely in Chinese, which is not the usual language in Thai.
  • Figure 4: Example from our model: The question is, 'Which came first, the chicken or the egg?' - The model successfully responds fully in Thai while reasoning through its thought process on general question.
  • Figure 5: Example demonstrate code-switching / language accuracy problem in DeepSeek R1 70B Distill. - The question is "$\text{Convert the point } (0,3) \text{ in rectangular coordinates to polar coordinates.}$ Enter your answer in the form $(r,\theta), \quad \text{where} \quad r > 0, \quad 0 \leq \theta < 2\pi.$" - The model successfully responds fully in Thai while reasoning through its thought process on math question.