Table of Contents
Fetching ...

Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation

Yuli Qiu, Jiashu Yao, Heyan Huang, Yuhang Guo

TL;DR

This study subdivide CoT reasoning into two parts: arranging and executing, and identifies that the bottleneck of models mainly lies in arranging rather than executing, and proposes a plan-based training and reasoning method that guides models to generate arranging steps through abstract plans.

Abstract

Multi-step reasoning ability of large language models is crucial in tasks such as math and tool utilization. Current researches predominantly focus on enhancing model performance in these multi-step reasoning tasks through fine-tuning with Chain-of-Thought (CoT) steps, yet these methods tend to be heuristic, without exploring nor resolving the bottleneck. In this study, we subdivide CoT reasoning into two parts: arranging and executing, and identify that the bottleneck of models mainly lies in arranging rather than executing. Based on this finding, we propose a plan-based training and reasoning method that guides models to generate arranging steps through abstract plans. We experiment on both math (GSM8k) and tool utilization (ToolBench) benchmarks. Results show that compared to fine-tuning directly with CoT data, our approach achieves a better performance on alleviating arranging bottleneck, particularly excelling in long-distance reasoning generalization.

Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation

TL;DR

This study subdivide CoT reasoning into two parts: arranging and executing, and identifies that the bottleneck of models mainly lies in arranging rather than executing, and proposes a plan-based training and reasoning method that guides models to generate arranging steps through abstract plans.

Abstract

Multi-step reasoning ability of large language models is crucial in tasks such as math and tool utilization. Current researches predominantly focus on enhancing model performance in these multi-step reasoning tasks through fine-tuning with Chain-of-Thought (CoT) steps, yet these methods tend to be heuristic, without exploring nor resolving the bottleneck. In this study, we subdivide CoT reasoning into two parts: arranging and executing, and identify that the bottleneck of models mainly lies in arranging rather than executing. Based on this finding, we propose a plan-based training and reasoning method that guides models to generate arranging steps through abstract plans. We experiment on both math (GSM8k) and tool utilization (ToolBench) benchmarks. Results show that compared to fine-tuning directly with CoT data, our approach achieves a better performance on alleviating arranging bottleneck, particularly excelling in long-distance reasoning generalization.

Paper Structure

This paper contains 22 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Examples of a math multi-step reasoning problem. This task can be split into two parts: Arranging(blue box) and Executing(yellow box). In the current Chain-of-Thought-Reasoning, these two parts are interwoven with each other, which may lead to errors in subsequent reasoning. We find the bottleneck of multi-step reasoning task is arranging rather than executing, and propose a method that allows LLM to first generate abstract arranging(plan) and use Plan Augment Reasoning to guide subsequent reasoning steps, ensuring the reliability of each step.
  • Figure 2: The distribution of arithmetic steps in GSM8k testset. Most of the problems need 2$\sim$4 steps to solve.
  • Figure 3: The results of the power-law fitting of experimental data $ExeAcc$ and $Acc(+plan)$ are closer than those of linear fitting, which indicates that model presents a power-law accumulation between math problems' final accuracy and single-step accuracy.
  • Figure 4: A framework of our method. Above the dotted line is Plan Augment Reasoning. LLM first generates plan in Round1, then make multi-step reasoning based on the plan to get final answer. Below the line is our Plan-Centric SFT to augment plan generation.
  • Figure 5: Score distribution on reasoning steps. Results of math problem are in (a)$\sim$(c), while (d)$\sim$(f) are tool utilization. SFT(CoT) means model trained on CoT steps, and Ours represents our plan-based method. We combine math reasoning steps $\geq6$ together to avoid the impact of small sample size. The improvement on both tasks are significant($p=0.005$ in math, $p=0.001$ in tool utilization).