Table of Contents
Fetching ...

Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models

Song Jiang, Zahra Shakeri, Aaron Chan, Maziar Sanjabi, Hamed Firooz, Yinglong Xia, Bugra Akyildiz, Yizhou Sun, Jinchao Li, Qifan Wang, Asli Celikyilmaz

TL;DR

ResPrompt reframes multi-step reasoning in large language models as graph-like dependencies and introduces residual connection prompting to explicitly reintroduce prerequisite information within prompts. By repeating exact tokens from earlier steps as residual links, ResPrompt aligns the in-prompt reasoning flow with the true underlying reasoning graph, yielding substantial gains over standard chain-of-thought prompting on six benchmarks across math, sequential, and commonsense tasks, particularly for questions with five or more steps. Extensive ablations show full-coverage residuals and exact-token reuse are critical, and gains scale with model size, suggesting an emergent ability to utilize residual structures in larger LLMs. The work demonstrates a simple, effective prompting strategy that enhances multi-step reasoning without model fine-tuning, with implications for more reliable problem solving in complex tasks.

Abstract

Chain-of-thought (CoT) prompting, which offers step-by-step problem-solving rationales, has impressively unlocked the reasoning potential of large language models (LLMs). Yet, the standard CoT is less effective in problems demanding multiple reasoning steps. This limitation arises from the complex reasoning process in multi-step problems: later stages often depend on the results of several steps earlier, not just the results of the immediately preceding step. Such complexities suggest the reasoning process is naturally represented as a graph. The almost linear and straightforward structure of CoT prompting, however, struggles to capture this complex reasoning graph. To address this challenge, we propose Residual Connection Prompting (RESPROMPT), a new prompting strategy that advances multi-step reasoning in LLMs. Our key idea is to reconstruct the reasoning graph within prompts. We achieve this by integrating necessary connections-links present in the reasoning graph but missing in the linear CoT flow-into the prompts. Termed "residual connections", these links are pivotal in morphing the linear CoT structure into a graph representation, effectively capturing the complex reasoning graphs inherent in multi-step problems. We evaluate RESPROMPT on six benchmarks across three diverse domains: math, sequential, and commonsense reasoning. For the open-sourced LLaMA family of models, RESPROMPT yields a significant average reasoning accuracy improvement of 12.5% on LLaMA-65B and 6.8% on LLaMA2-70B. Breakdown analysis further highlights RESPROMPT particularly excels in complex multi-step reasoning: for questions demanding at least five reasoning steps, RESPROMPT outperforms the best CoT based benchmarks by a remarkable average improvement of 21.1% on LLaMA-65B and 14.3% on LLaMA2-70B. Through extensive ablation studies and analyses, we pinpoint how to most effectively build residual connections.

Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models

TL;DR

ResPrompt reframes multi-step reasoning in large language models as graph-like dependencies and introduces residual connection prompting to explicitly reintroduce prerequisite information within prompts. By repeating exact tokens from earlier steps as residual links, ResPrompt aligns the in-prompt reasoning flow with the true underlying reasoning graph, yielding substantial gains over standard chain-of-thought prompting on six benchmarks across math, sequential, and commonsense tasks, particularly for questions with five or more steps. Extensive ablations show full-coverage residuals and exact-token reuse are critical, and gains scale with model size, suggesting an emergent ability to utilize residual structures in larger LLMs. The work demonstrates a simple, effective prompting strategy that enhances multi-step reasoning without model fine-tuning, with implications for more reliable problem solving in complex tasks.

Abstract

Chain-of-thought (CoT) prompting, which offers step-by-step problem-solving rationales, has impressively unlocked the reasoning potential of large language models (LLMs). Yet, the standard CoT is less effective in problems demanding multiple reasoning steps. This limitation arises from the complex reasoning process in multi-step problems: later stages often depend on the results of several steps earlier, not just the results of the immediately preceding step. Such complexities suggest the reasoning process is naturally represented as a graph. The almost linear and straightforward structure of CoT prompting, however, struggles to capture this complex reasoning graph. To address this challenge, we propose Residual Connection Prompting (RESPROMPT), a new prompting strategy that advances multi-step reasoning in LLMs. Our key idea is to reconstruct the reasoning graph within prompts. We achieve this by integrating necessary connections-links present in the reasoning graph but missing in the linear CoT flow-into the prompts. Termed "residual connections", these links are pivotal in morphing the linear CoT structure into a graph representation, effectively capturing the complex reasoning graphs inherent in multi-step problems. We evaluate RESPROMPT on six benchmarks across three diverse domains: math, sequential, and commonsense reasoning. For the open-sourced LLaMA family of models, RESPROMPT yields a significant average reasoning accuracy improvement of 12.5% on LLaMA-65B and 6.8% on LLaMA2-70B. Breakdown analysis further highlights RESPROMPT particularly excels in complex multi-step reasoning: for questions demanding at least five reasoning steps, RESPROMPT outperforms the best CoT based benchmarks by a remarkable average improvement of 21.1% on LLaMA-65B and 14.3% on LLaMA2-70B. Through extensive ablation studies and analyses, we pinpoint how to most effectively build residual connections.
Paper Structure (31 sections, 12 figures, 25 tables)

This paper contains 31 sections, 12 figures, 25 tables.

Figures (12)

  • Figure 1: CoT reasoning accuracy based on the number of reasoning steps for LLaMA-65B and LLaMA2-70B across two math benchmarks. Horizontal dashed lines are the overall accuracy in each benchmark. Left: GSM8K, 8-shot; Right: AQUA-RAT, 4-shot. CoT prompts are sourced from wei2022chain.
  • Figure 2: (a) A multi-step math question from the training set of GSM8K cobbe2021training. (b) Standard CoT prompting for this question. The intermediate steps are highlighted in blue. (c) The reasoning flow within the CoT prompts in (b), which exhibits a linear structure. (d) The underlying complex reasoning graph of this math question. (e) Our approach, ResPrompt (residual connection prompting) for this question. The intermediate steps are highlighted in blue, while residual connections are indicated with colored backgrounds and linked by dashed arrows. Note that phrases with a blue backgroundrepresent given conditions from the question, while phraseswithbackgroundsin othercolorsdenote results derived from intermediate steps.
  • Figure 3: ResPrompt's performance according to number of reasoning steps on GSM8K, AQUA-RAT and MathQA on LLaMA2-70B. The curves show the comparison of ResPrompt's reasoning accuracy with CoT based baselines in each step, while the blue bars represent the distribution of data within each reasoning step.
  • Figure 4: Reasoning accuracy with different residual connections implementations.
  • Figure 5: Reasoning accuracy comparison between ResPrompt and CoT across all LLaMA model sizes. CoT is the model with better performance between Short CoT and Long CoT for each dataset.
  • ...and 7 more figures