Table of Contents
Fetching ...

Break the Chain: Large Language Models Can be Shortcut Reasoners

Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang

TL;DR

Chain-of-Thought prompting often improves reasoning but at high token costs and limited applicability. This work introduces Break the Chain, injecting heuristic shortcuts and zero-shot prompts to achieve fast, token-efficient reasoning, and pairs it with the ShortcutQA dataset to benchmark heuristic reasoning. Across OpenAI and open-source LLMs, Break the Chain matches or surpasses few-shot CoT while substantially reducing token usage, with larger models benefiting more from the approach. ShortcutQA provides a robust, task-diverse benchmark for evaluating and advancing efficient reasoning in AI systems.

Abstract

Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integration of human-like heuristics and shortcuts into language models (LMs) through "break the chain" strategies. These strategies disrupt traditional CoT processes using controlled variables to assess their efficacy. Additionally, we develop innovative zero-shot prompting strategies that encourage the use of shortcuts, enabling LMs to quickly exploit reasoning clues and bypass detailed procedural steps. Our comprehensive experiments across various LMs, both commercial and open-source, reveal that LMs maintain effective performance with "break the chain" strategies. We also introduce ShortcutQA, a dataset specifically designed to evaluate reasoning through shortcuts, compiled from competitive tests optimized for heuristic reasoning tasks such as forward/backward reasoning and simplification. Our analysis confirms that ShortcutQA not only poses a robust challenge to LMs but also serves as an essential benchmark for enhancing reasoning efficiency in AI.

Break the Chain: Large Language Models Can be Shortcut Reasoners

TL;DR

Chain-of-Thought prompting often improves reasoning but at high token costs and limited applicability. This work introduces Break the Chain, injecting heuristic shortcuts and zero-shot prompts to achieve fast, token-efficient reasoning, and pairs it with the ShortcutQA dataset to benchmark heuristic reasoning. Across OpenAI and open-source LLMs, Break the Chain matches or surpasses few-shot CoT while substantially reducing token usage, with larger models benefiting more from the approach. ShortcutQA provides a robust, task-diverse benchmark for evaluating and advancing efficient reasoning in AI systems.

Abstract

Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integration of human-like heuristics and shortcuts into language models (LMs) through "break the chain" strategies. These strategies disrupt traditional CoT processes using controlled variables to assess their efficacy. Additionally, we develop innovative zero-shot prompting strategies that encourage the use of shortcuts, enabling LMs to quickly exploit reasoning clues and bypass detailed procedural steps. Our comprehensive experiments across various LMs, both commercial and open-source, reveal that LMs maintain effective performance with "break the chain" strategies. We also introduce ShortcutQA, a dataset specifically designed to evaluate reasoning through shortcuts, compiled from competitive tests optimized for heuristic reasoning tasks such as forward/backward reasoning and simplification. Our analysis confirms that ShortcutQA not only poses a robust challenge to LMs but also serves as an essential benchmark for enhancing reasoning efficiency in AI.
Paper Structure (23 sections, 2 equations, 5 figures, 7 tables)

This paper contains 23 sections, 2 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: ChatGPT responses to Chain-of-Thought and "Break the Chain". Our "Break the Chain" method significantly simplifies the reasoning process.
  • Figure 2: Performance comparison of different token limits on the mathematical reasoning task from ShortcutQA.
  • Figure 3: Relationship between CoT Chain Length and Accuracy.
  • Figure 4: Our evaluation pipeline.
  • Figure 5: The Impact of Model Size on CoT's Relative Outperformance over Other Prompts across Datasets