Markov Chain of Thought for Efficient Mathematical Reasoning
Wen Yang, Minpeng Liao, Kai Fan
TL;DR
This work introduces Markov Chain of Thought (MCoT), a memory-efficient framework for mathematical reasoning that treats reasoning as a sequence of state transitions between questions and derivation steps. By enforcing a Markov property and decomposing the training objective, MCoT enables longer reasoning chains with minimal reliance on KV caching, substantially improving efficiency over traditional multi-step reasoning while maintaining or enhancing accuracy. The authors construct the MCoTInstruct dataset via seed data from GSM8K and MATH and self-distillation, comprising about 82k Markov chains (~160k step-level entries), and demonstrate strong performance gains across multiple base models (including 7B and 70B scales) on in-domain and out-of-domain math datasets. They also show that MCoT reduces prompt length and memory usage during inference, achieves up to $1.90\times$ faster reasoning than MSR, and can self-correct without excessive propagation of errors, with future work exploring MCTS to further address potential limitations. The work contributes a novel framework, a comprehensive dataset, and empirical evidence of improved efficiency and robust problem-solving in mathematical reasoning for LLMs.
Abstract
Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions, significantly enhancing the mathematical reasoning capabilities of large language models. As the prevalence of long CoT, the number of reasoning steps exceeds manageable token limits and leads to higher computational demands. Inspired by the fundamental logic of human cognition, "derive, then reduce", we conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT). In this study, we consider the mathematical reasoning task, defining each reasoning step as text accompanied by a Python code snippet. To facilitate a longer reasoning path, self-correction is enabled through interactions with the code interpreter. Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference without relying on a lengthy KV cache. In our experiments, we curate the $\texttt{MCoTInstruct}$ dataset, and the empirical results indicate that MCoT not only significantly enhances efficiency but also maintains comparable accuracy. While much remains to be explored, this work paves the way for exploring the long CoT reasoning abilities of LLMs. The code is available at https://github.com/james-yw/Markov-Chain-of-Thought
