Table of Contents
Fetching ...

CoT-X: An Adaptive Framework for Cross-Model Chain-of-Thought Transfer and Optimization

Ziqian Bi, Kaijie Chen, Tianyang Wang, Junfeng Hao, Benji Peng, Xinyuan Song

TL;DR

The paper tackles the high computation cost of chain-of-thought reasoning by introducing an adaptive summarization framework that transfers reasoning traces across model scales under token budgets. It combines semantic segmentation, importance propagation, and coherence reconstruction with Gaussian Process-based Bayesian optimization to balance accuracy and robustness, achieving up to 40% gains over truncation and an ~84% reduction in evaluation cost. The work also uncovers a power-law relationship between average accuracy and cross-domain robustness, providing Pareto-frontier guidance for deployment. Across 7,501 medical questions and multilingual datasets, the framework demonstrates strong cross-model transferability, practical efficiency, and clear pathways for real-world medical AI applications.

Abstract

Chain-of-Thought (CoT) reasoning enhances the problem-solving ability of large language models (LLMs) but leads to substantial inference overhead, limiting deployment in resource-constrained settings. This paper investigates efficient CoT transfer across models of different scales and architectures through an adaptive reasoning summarization framework. The proposed method compresses reasoning traces via semantic segmentation with importance scoring, budget-aware dynamic compression, and coherence reconstruction, preserving critical reasoning steps while significantly reducing token usage. Experiments on 7{,}501 medical examination questions across 10 specialties show up to 40% higher accuracy than truncation under the same token budgets. Evaluations on 64 model pairs from eight LLMs (1.5B-32B parameters, including DeepSeek-R1 and Qwen3) confirm strong cross-model transferability. Furthermore, a Gaussian Process-based Bayesian optimization module reduces evaluation cost by 84% and reveals a power-law relationship between model size and cross-domain robustness. These results demonstrate that reasoning summarization provides a practical path toward efficient CoT transfer, enabling advanced reasoning under tight computational constraints. Code will be released upon publication.

CoT-X: An Adaptive Framework for Cross-Model Chain-of-Thought Transfer and Optimization

TL;DR

The paper tackles the high computation cost of chain-of-thought reasoning by introducing an adaptive summarization framework that transfers reasoning traces across model scales under token budgets. It combines semantic segmentation, importance propagation, and coherence reconstruction with Gaussian Process-based Bayesian optimization to balance accuracy and robustness, achieving up to 40% gains over truncation and an ~84% reduction in evaluation cost. The work also uncovers a power-law relationship between average accuracy and cross-domain robustness, providing Pareto-frontier guidance for deployment. Across 7,501 medical questions and multilingual datasets, the framework demonstrates strong cross-model transferability, practical efficiency, and clear pathways for real-world medical AI applications.

Abstract

Chain-of-Thought (CoT) reasoning enhances the problem-solving ability of large language models (LLMs) but leads to substantial inference overhead, limiting deployment in resource-constrained settings. This paper investigates efficient CoT transfer across models of different scales and architectures through an adaptive reasoning summarization framework. The proposed method compresses reasoning traces via semantic segmentation with importance scoring, budget-aware dynamic compression, and coherence reconstruction, preserving critical reasoning steps while significantly reducing token usage. Experiments on 7{,}501 medical examination questions across 10 specialties show up to 40% higher accuracy than truncation under the same token budgets. Evaluations on 64 model pairs from eight LLMs (1.5B-32B parameters, including DeepSeek-R1 and Qwen3) confirm strong cross-model transferability. Furthermore, a Gaussian Process-based Bayesian optimization module reduces evaluation cost by 84% and reveals a power-law relationship between model size and cross-domain robustness. These results demonstrate that reasoning summarization provides a practical path toward efficient CoT transfer, enabling advanced reasoning under tight computational constraints. Code will be released upon publication.

Paper Structure

This paper contains 50 sections, 10 equations, 27 figures, 2 tables.

Figures (27)

  • Figure 1: Overview of inference paradigms. (a–b) Conventional approaches either rely on single-model end-to-end inference or cascade models without explicit CoT transfer, leading to high token cost or information loss. (c–d) Our method performs adaptive CoT transfer with summarization and budget-aware answering, and employs Bayesian optimization for efficient model selection under accuracy–robustness trade-offs.
  • Figure 2: Overview of CoT transfer with adaptive summarization. Large models generate detailed reasoning chains that are compressed while preserving key information, enabling efficient inference with smaller models.
  • Figure 3: Hierarchical compression framework for chain-of-thought reasoning. The system employs dual-phase compression: concise generation followed by intelligent summarization, achieving high information density while preserving reasoning integrity.
  • Figure 4: 8×8 transfer matrix showing performance for all model scale combinations. The matrix demonstrates the diagonal dominance pattern and shows that larger models generally perform better both as thinking and answering models.
  • Figure 5: Comparison between adaptive summarization and direct truncation across token budgets. Bars represent average accuracy with error bars showing standard deviation. Summarization consistently outperforms truncation, with the advantage most pronounced at lower budgets.
  • ...and 22 more figures