LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

Junhong Wu; Jinliang Lu; Zixuan Ren; Gangqiang Hu; Zhi Wu; Dai Dai; Hua Wu

LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

Junhong Wu, Jinliang Lu, Zixuan Ren, Gangqiang Hu, Zhi Wu, Dai Dai, Hua Wu

TL;DR

This paper investigates Soft Thinking in large language models, revealing that, contrary to claims of parallel reasoning, LLMs tend to follow a greedy, single-threaded path driven by the top soft token. It introduces Stochastic Soft Thinking, leveraging Dirichlet sampling and the Gumbel-Softmax trick to inject controllable randomness, which substantially improves reasoning performance across eight benchmarks and enhances exploration potential for RL. The work provides theoretical justification via Luce’s Choice Axiom and demonstrates stronger exploration than conventional CoT, while outlining practical limitations and directions for future research in RL-enabled, continuous-space reasoning. Overall, it deepens understanding of latent reasoning dynamics and offers a concrete decoding approach to unlock the benefits of Soft Thinking.

Abstract

Human cognition naturally engages with abstract and fluid concepts, whereas existing reasoning models often rely on generating discrete tokens, potentially constraining their expressive capabilities. Recent advancements aim to address this limitation by enabling large language models (LLMs) to generate soft, abstract tokens, thus facilitating reasoning within a continuous concept space. In this paper, we investigate the Soft Thinking capabilities of various LLMs through a systematic analysis of their internal behavior using a suite of probing techniques. Contrary to the prevailing belief that Soft Thinking supports parallel exploration of diverse reasoning paths, our findings reveal that LLMs behave as single-threaded reasoners--they predominantly rely on the token with the highest probability in the soft input to predict the next step. This behavior induces a greedy feedback loop that suppresses alternative reasoning paths and undermines the benefits of transmitting richer information via Soft Tokens. To address this Greedy Pitfall, we propose Stochastic Soft Thinking, which introduces stochasticity to break free from this Greedy Pitfall. Our experiments demonstrate that incorporating randomness--particularly with the Gumbel-Softmax trick--can alleviate the limitations of vanilla approaches and unleash the potential of Soft Thinking, resulting in superior performance across eight reasoning benchmarks. We further demonstrate that Stochastic Soft Thinking exhibits stronger exploration potential compared to conventional COT. Our findings deepen the understanding of continuous reasoning and establish the foundation for future work on improving Soft Thinking with Reinforcement Learning.

LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

TL;DR

Abstract

LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)