Table of Contents
Fetching ...

Markov Chain of Thought for Efficient Mathematical Reasoning

Wen Yang, Minpeng Liao, Kai Fan

TL;DR

This work introduces Markov Chain of Thought (MCoT), a memory-efficient framework for mathematical reasoning that treats reasoning as a sequence of state transitions between questions and derivation steps. By enforcing a Markov property and decomposing the training objective, MCoT enables longer reasoning chains with minimal reliance on KV caching, substantially improving efficiency over traditional multi-step reasoning while maintaining or enhancing accuracy. The authors construct the MCoTInstruct dataset via seed data from GSM8K and MATH and self-distillation, comprising about 82k Markov chains (~160k step-level entries), and demonstrate strong performance gains across multiple base models (including 7B and 70B scales) on in-domain and out-of-domain math datasets. They also show that MCoT reduces prompt length and memory usage during inference, achieves up to $1.90\times$ faster reasoning than MSR, and can self-correct without excessive propagation of errors, with future work exploring MCTS to further address potential limitations. The work contributes a novel framework, a comprehensive dataset, and empirical evidence of improved efficiency and robust problem-solving in mathematical reasoning for LLMs.

Abstract

Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions, significantly enhancing the mathematical reasoning capabilities of large language models. As the prevalence of long CoT, the number of reasoning steps exceeds manageable token limits and leads to higher computational demands. Inspired by the fundamental logic of human cognition, "derive, then reduce", we conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT). In this study, we consider the mathematical reasoning task, defining each reasoning step as text accompanied by a Python code snippet. To facilitate a longer reasoning path, self-correction is enabled through interactions with the code interpreter. Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference without relying on a lengthy KV cache. In our experiments, we curate the $\texttt{MCoTInstruct}$ dataset, and the empirical results indicate that MCoT not only significantly enhances efficiency but also maintains comparable accuracy. While much remains to be explored, this work paves the way for exploring the long CoT reasoning abilities of LLMs. The code is available at https://github.com/james-yw/Markov-Chain-of-Thought

Markov Chain of Thought for Efficient Mathematical Reasoning

TL;DR

This work introduces Markov Chain of Thought (MCoT), a memory-efficient framework for mathematical reasoning that treats reasoning as a sequence of state transitions between questions and derivation steps. By enforcing a Markov property and decomposing the training objective, MCoT enables longer reasoning chains with minimal reliance on KV caching, substantially improving efficiency over traditional multi-step reasoning while maintaining or enhancing accuracy. The authors construct the MCoTInstruct dataset via seed data from GSM8K and MATH and self-distillation, comprising about 82k Markov chains (~160k step-level entries), and demonstrate strong performance gains across multiple base models (including 7B and 70B scales) on in-domain and out-of-domain math datasets. They also show that MCoT reduces prompt length and memory usage during inference, achieves up to faster reasoning than MSR, and can self-correct without excessive propagation of errors, with future work exploring MCTS to further address potential limitations. The work contributes a novel framework, a comprehensive dataset, and empirical evidence of improved efficiency and robust problem-solving in mathematical reasoning for LLMs.

Abstract

Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions, significantly enhancing the mathematical reasoning capabilities of large language models. As the prevalence of long CoT, the number of reasoning steps exceeds manageable token limits and leads to higher computational demands. Inspired by the fundamental logic of human cognition, "derive, then reduce", we conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT). In this study, we consider the mathematical reasoning task, defining each reasoning step as text accompanied by a Python code snippet. To facilitate a longer reasoning path, self-correction is enabled through interactions with the code interpreter. Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference without relying on a lengthy KV cache. In our experiments, we curate the dataset, and the empirical results indicate that MCoT not only significantly enhances efficiency but also maintains comparable accuracy. While much remains to be explored, this work paves the way for exploring the long CoT reasoning abilities of LLMs. The code is available at https://github.com/james-yw/Markov-Chain-of-Thought

Paper Structure

This paper contains 44 sections, 6 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparison of reasoning efficiency between MCoT and Multi-Step Reasoning (MSR), showing the variation in reasoning time costs for both methods relative to step 1 as the number of reasoning steps increases.
  • Figure 2: Schematic illustrating various approaches to mathematical reasoning with LLMs and their reasoning efficiency. The masked demonstrations across different approaches show that the efficiency of MCoT is similar to that of the blockwise masking approach, while the efficiency of MSR and question decomposition reasoning is more akin to that of the vanilla masking.
  • Figure 3: Comparison of token Length in MCoT and MSR on training and test set.
  • Figure 4: Comparison of problem solving between MCoT and MSR on MATH test set, with Llemma7B as base model.
  • Figure 5: The reasoning process of two reasoning approachs for mathematical reasoning.
  • ...and 5 more figures