Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe
Kunat Pipatanakul, Pittawat Taveekitworachai, Potsawee Manakul, Kasima Tharnpipitchai
TL;DR
The work addresses the gap in reasoning capabilities for low-resource languages by merging a reasoning-focused $70B$ model with a Thai-specialized $70B$ model, using publicly available data and an affordable compute budget. It introduces a two-stage approach: representation alignment via supervised fine-tuning with bilingual Bespoke-Stratos data and an ability-aware merging strategy (Mergekit, dare_linear) that balances early-layer reasoning with later-layer language fidelity. Empirical results show the final Typhoon2-R1-70B model approaches DeepSeek R1 on reasoning while preserving Thai-language performance, delivering substantial overall gains and demonstrating transferability to another Thai-friendly backbone (Sealion). The findings support the practicality of data-driven, cross-language reasoning transfer for low-resource settings, while outlining limitations and avenues for broader multilingual extension and refinement of instruction-tuning strategies.
Abstract
This paper investigates data selection and model merging methodologies aimed at incorporating advanced reasoning capabilities such as those of DeepSeek R1 into language-specific large language models (LLMs), with a particular focus on the Thai LLM. Our goal is to enhance the reasoning capabilities of language-specific LLMs while maintaining their target language abilities. DeepSeek R1 excels in reasoning but primarily benefits high-resource languages such as English and Chinese. However, low-resource languages remain underserved due to the dominance of English-centric training data and model optimizations, which limit performance in these languages. This limitation results in unreliable code-switching and diminished effectiveness on tasks in low-resource languages. Meanwhile, local and regional LLM initiatives have attempted to bridge this gap by developing language-specific LLMs that focus on improving local linguistic fidelity. We demonstrate that, with only publicly available datasets and a computational budget of $120, it is possible to enhance the reasoning capabilities of language-specific LLMs to match the level of DeepSeek R1, without compromising their performance on target language tasks.
