RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
Junyao Yang, Jianwei Wang, Huiping Zhuang, Cen Chen, Ziqian Zeng
TL;DR
The paper tackles the difficulty of merging long chain-of-thought (CoT) reasoning models with domain-specific models without eroding reasoning or output quality. It introduces RCP-Merging, a framework that treats reasoning capability as a prior and uses a Reasoning Preservation Indicator alongside Domain Knowledge Sensitivity to selectively merge domain weights while preserving core reasoning through a Bayesian prior (via a diagonal Fisher Information Matrix). The approach yields state-of-the-art or near-state-of-the-art results on BioMedicine and Finance benchmarks, maintains low gibberish output, and demonstrates cross-architecture robustness with emergent long-CoT behavior in domain problems. These findings suggest a practical pathway to versatile, dual-capability LLMs that balance specialized knowledge with complex multi-step reasoning, with broad implications for scalable deployment across domains.
Abstract
Large Language Models (LLMs) with long chain-of-thought (CoT) capability, termed Reasoning Models, demonstrate superior intricate problem-solving abilities through multi-step long CoT reasoning. To create a dual-capability model with long CoT capability and domain-specific knowledge without substantial computational and data costs, model merging emerges as a highly resource-efficient method. However, significant challenges lie in merging domain-specific LLMs with long CoT ones since nowadays merging methods suffer from reasoning capability degradation, even gibberish output and output collapse. To overcome this, we introduce RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior, a novel merging framework designed to integrate domain-specific LLMs with long CoT capability, meanwhile maintaining model performance in the original domain. Treating reasoning model weights as foundational prior, our method utilizes a reasoning capability indicator to preserve core long CoT capability model weights while selectively merging essential domain-specific weights. We conducted extensive experiments on Qwen2.5-7B, Llama3.1-8B, and Qwen2.5-1.5B models in BioMedicine and Finance domains. Our results show that RCP-Merging successfully merges a reasoning model with domain-specific ones, improving domain task performance by 9.5% and 9.2% over state-of-the-art methods, without significantly harming the original long CoT reasoning capability.
