Table of Contents
Fetching ...

First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Yoichi Aoki, Keito Kudo, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Keisuke Sakaguchi, Kentaro Inui

TL;DR

This work examines how language models perform multi-step reasoning by probing the dynamic use of heuristics. Through arithmetic reasoning tasks and carefully engineered distractor variants, it shows that models rely more on heuristics early in the reasoning process and progressively adopt goal-directed, rational strategies as they approach the answer, implying a limited capacity to backtrack across many future steps. The analysis introduces the distance-to-go measure $d$ and the minimal solution $h^*$ within a state-transition framework to quantify this shift, and evaluates multiple models (e.g., PaLM2, Llama2-13B, GPT-3.5, GPT-4) across GSM8K and artificial datasets. The findings offer both cognitive implications—humanoid-like problem solving with dynamic strategy switching—and engineering guidance for prompting and evaluating LMs on complex, multi-step tasks. Limitations include the scope to four models and two task types, suggesting directions for broader validation and mechanistic exploration.

Abstract

Multi-step reasoning instruction, such as chain-of-thought prompting, is widely adopted to explore better language models (LMs) performance. We report on the systematic strategy that LMs employ in such a multi-step reasoning process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning, where more reasoning steps remain to reach a goal. Conversely, their reliance on heuristics decreases as LMs progress closer to the final answer through multiple reasoning steps. This suggests that LMs can backtrack only a limited number of future steps and dynamically combine heuristic strategies with rationale ones in tasks involving multi-step reasoning.

First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

TL;DR

This work examines how language models perform multi-step reasoning by probing the dynamic use of heuristics. Through arithmetic reasoning tasks and carefully engineered distractor variants, it shows that models rely more on heuristics early in the reasoning process and progressively adopt goal-directed, rational strategies as they approach the answer, implying a limited capacity to backtrack across many future steps. The analysis introduces the distance-to-go measure and the minimal solution within a state-transition framework to quantify this shift, and evaluates multiple models (e.g., PaLM2, Llama2-13B, GPT-3.5, GPT-4) across GSM8K and artificial datasets. The findings offer both cognitive implications—humanoid-like problem solving with dynamic strategy switching—and engineering guidance for prompting and evaluating LMs on complex, multi-step tasks. Limitations include the scope to four models and two task types, suggesting directions for broader validation and mechanistic exploration.

Abstract

Multi-step reasoning instruction, such as chain-of-thought prompting, is widely adopted to explore better language models (LMs) performance. We report on the systematic strategy that LMs employ in such a multi-step reasoning process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning, where more reasoning steps remain to reach a goal. Conversely, their reliance on heuristics decreases as LMs progress closer to the final answer through multiple reasoning steps. This suggests that LMs can backtrack only a limited number of future steps and dynamically combine heuristic strategies with rationale ones in tasks involving multi-step reasoning.

Paper Structure

This paper contains 48 sections, 1 equation, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Illustration of the systematic reasoning strategy we discovered in language models . When the goal is distant from the current reasoning step, they tend to rely on heuristics to take the next reasoning step, such as lexical overlap with a question, leading to the wrong direction (red path). In contrast, when the goal is within a limited distance, they are more likely to take rational actions (green path) to reach the goal.
  • Figure 2: Overview of the task setting. Given premises and a question, a model answers the question step-by-step (left part). Through each reasoning step $t$ of selecting/paraphrasing relevant premise $p_k \in P$, the available facts $\bm z$ are enriched (reasoning state progresses in the right part). If a reasoning step follows the minimal solution (green path in the right part), the distance to the answer $d$ decreases.
  • Figure 3: The ratio at which a particular distractor is selected (y-axis: $r$) in each reasoning step (x-axis: $d$).
  • Figure 4: Number of cases where a particular distractor is selected (y-axis: $r$) in each reasoning step (x-axis: $d$). The heuristic to be included in the distructor is overlap.
  • Figure 5: Number of cases where a particular distractor is selected (y-axis: $r$) in each reasoning step (x-axis: $d$). The heuristic to be included in the distructor is position.