Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin, Jian Xie, Siyu Yuan, Deqing Yang
TL;DR
<3-5 sentence high-level summary> This paper interrogates why implicit reasoning in transformers often fails to exhibit advanced, stepwise capabilities and whether internal stepwise reasoning can emerge without explicit CoT. Using a synthetic multi-step arithmetic dataset, activation patching, and RoPE-enhanced GPT-2, it shows that stepwise reasoning can emerge when training data follows fixed patterns, but generalization collapses when premise order is unfixed due to shortcut learning such as number-chaining. The authors extend the investigation to state-of-the-art LLMs and demonstrate that these models similarly rely on shortcuts under unfixed patterns, revealing a fundamental limitation of current implicit reasoning. The work highlights the need for training regimes and architectures that promote genuine variable-tracking and robust, order-agnostic reasoning beyond shortcut-based strategies.
Abstract
Test-time compute is emerging as a new paradigm for enhancing language models' complex multi-step reasoning capabilities, as demonstrated by the success of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit reasoning in test-time compute, implicit reasoning is more inference-efficient, requiring fewer generated tokens. However, why does the advanced reasoning capability fail to emerge in the implicit reasoning style? In this work, we train GPT-2 from scratch on a curated multi-step mathematical reasoning dataset and conduct analytical experiments to investigate how language models perform implicit reasoning in multi-step tasks. Our findings reveal: 1) Language models can perform step-by-step reasoning and achieve high accuracy in both in-domain and out-of-domain tests via implicit reasoning. However, this capability only emerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning abilities emerging from training on unfixed-pattern data tend to overfit a specific pattern and fail to generalize further. Notably, this limitation is also observed in state-of-the-art large language models. These findings suggest that language models acquire implicit reasoning through shortcut learning, enabling strong performance on tasks with similar patterns while lacking generalization.
