Implicit Reasoning in Transformers is Reasoning through Shortcuts

Tianhe Lin; Jian Xie; Siyu Yuan; Deqing Yang

Implicit Reasoning in Transformers is Reasoning through Shortcuts

Tianhe Lin, Jian Xie, Siyu Yuan, Deqing Yang

TL;DR

<3-5 sentence high-level summary> This paper interrogates why implicit reasoning in transformers often fails to exhibit advanced, stepwise capabilities and whether internal stepwise reasoning can emerge without explicit CoT. Using a synthetic multi-step arithmetic dataset, activation patching, and RoPE-enhanced GPT-2, it shows that stepwise reasoning can emerge when training data follows fixed patterns, but generalization collapses when premise order is unfixed due to shortcut learning such as number-chaining. The authors extend the investigation to state-of-the-art LLMs and demonstrate that these models similarly rely on shortcuts under unfixed patterns, revealing a fundamental limitation of current implicit reasoning. The work highlights the need for training regimes and architectures that promote genuine variable-tracking and robust, order-agnostic reasoning beyond shortcut-based strategies.

Abstract

Test-time compute is emerging as a new paradigm for enhancing language models' complex multi-step reasoning capabilities, as demonstrated by the success of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit reasoning in test-time compute, implicit reasoning is more inference-efficient, requiring fewer generated tokens. However, why does the advanced reasoning capability fail to emerge in the implicit reasoning style? In this work, we train GPT-2 from scratch on a curated multi-step mathematical reasoning dataset and conduct analytical experiments to investigate how language models perform implicit reasoning in multi-step tasks. Our findings reveal: 1) Language models can perform step-by-step reasoning and achieve high accuracy in both in-domain and out-of-domain tests via implicit reasoning. However, this capability only emerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning abilities emerging from training on unfixed-pattern data tend to overfit a specific pattern and fail to generalize further. Notably, this limitation is also observed in state-of-the-art large language models. These findings suggest that language models acquire implicit reasoning through shortcut learning, enabling strong performance on tasks with similar patterns while lacking generalization.

Implicit Reasoning in Transformers is Reasoning through Shortcuts

TL;DR

Abstract

Implicit Reasoning in Transformers is Reasoning through Shortcuts

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (18)