Table of Contents
Fetching ...

LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

Junhong Wu, Jinliang Lu, Zixuan Ren, Gangqiang Hu, Zhi Wu, Dai Dai, Hua Wu

TL;DR

This paper investigates Soft Thinking in large language models, revealing that, contrary to claims of parallel reasoning, LLMs tend to follow a greedy, single-threaded path driven by the top soft token. It introduces Stochastic Soft Thinking, leveraging Dirichlet sampling and the Gumbel-Softmax trick to inject controllable randomness, which substantially improves reasoning performance across eight benchmarks and enhances exploration potential for RL. The work provides theoretical justification via Luce’s Choice Axiom and demonstrates stronger exploration than conventional CoT, while outlining practical limitations and directions for future research in RL-enabled, continuous-space reasoning. Overall, it deepens understanding of latent reasoning dynamics and offers a concrete decoding approach to unlock the benefits of Soft Thinking.

Abstract

Human cognition naturally engages with abstract and fluid concepts, whereas existing reasoning models often rely on generating discrete tokens, potentially constraining their expressive capabilities. Recent advancements aim to address this limitation by enabling large language models (LLMs) to generate soft, abstract tokens, thus facilitating reasoning within a continuous concept space. In this paper, we investigate the Soft Thinking capabilities of various LLMs through a systematic analysis of their internal behavior using a suite of probing techniques. Contrary to the prevailing belief that Soft Thinking supports parallel exploration of diverse reasoning paths, our findings reveal that LLMs behave as single-threaded reasoners--they predominantly rely on the token with the highest probability in the soft input to predict the next step. This behavior induces a greedy feedback loop that suppresses alternative reasoning paths and undermines the benefits of transmitting richer information via Soft Tokens. To address this Greedy Pitfall, we propose Stochastic Soft Thinking, which introduces stochasticity to break free from this Greedy Pitfall. Our experiments demonstrate that incorporating randomness--particularly with the Gumbel-Softmax trick--can alleviate the limitations of vanilla approaches and unleash the potential of Soft Thinking, resulting in superior performance across eight reasoning benchmarks. We further demonstrate that Stochastic Soft Thinking exhibits stronger exploration potential compared to conventional COT. Our findings deepen the understanding of continuous reasoning and establish the foundation for future work on improving Soft Thinking with Reinforcement Learning.

LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

TL;DR

This paper investigates Soft Thinking in large language models, revealing that, contrary to claims of parallel reasoning, LLMs tend to follow a greedy, single-threaded path driven by the top soft token. It introduces Stochastic Soft Thinking, leveraging Dirichlet sampling and the Gumbel-Softmax trick to inject controllable randomness, which substantially improves reasoning performance across eight benchmarks and enhances exploration potential for RL. The work provides theoretical justification via Luce’s Choice Axiom and demonstrates stronger exploration than conventional CoT, while outlining practical limitations and directions for future research in RL-enabled, continuous-space reasoning. Overall, it deepens understanding of latent reasoning dynamics and offers a concrete decoding approach to unlock the benefits of Soft Thinking.

Abstract

Human cognition naturally engages with abstract and fluid concepts, whereas existing reasoning models often rely on generating discrete tokens, potentially constraining their expressive capabilities. Recent advancements aim to address this limitation by enabling large language models (LLMs) to generate soft, abstract tokens, thus facilitating reasoning within a continuous concept space. In this paper, we investigate the Soft Thinking capabilities of various LLMs through a systematic analysis of their internal behavior using a suite of probing techniques. Contrary to the prevailing belief that Soft Thinking supports parallel exploration of diverse reasoning paths, our findings reveal that LLMs behave as single-threaded reasoners--they predominantly rely on the token with the highest probability in the soft input to predict the next step. This behavior induces a greedy feedback loop that suppresses alternative reasoning paths and undermines the benefits of transmitting richer information via Soft Tokens. To address this Greedy Pitfall, we propose Stochastic Soft Thinking, which introduces stochasticity to break free from this Greedy Pitfall. Our experiments demonstrate that incorporating randomness--particularly with the Gumbel-Softmax trick--can alleviate the limitations of vanilla approaches and unleash the potential of Soft Thinking, resulting in superior performance across eight reasoning benchmarks. We further demonstrate that Stochastic Soft Thinking exhibits stronger exploration potential compared to conventional COT. Our findings deepen the understanding of continuous reasoning and establish the foundation for future work on improving Soft Thinking with Reinforcement Learning.

Paper Structure

This paper contains 39 sections, 10 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Left: Soft Thinking replaces the discrete token $t$ with the Soft Token $st$ (defined as the probability distribution over vocabulary). Right: Soft Thinking predominantly explores branches associated with the top-1 token. In contrast, paths stemming from non-top-1 tokens are typically terminated in the next step.
  • Figure 2: An example illustrating the probability distribution of the vanilla Soft Thinking method.
  • Figure 3: Output entropy/token probability vs. JS-divergence between next token prediction probabilities yield from different inputs. The prediction of soft input is nearly identical to the prediction of the 1st token input, but completely different from the prediction of the 2nd token input.
  • Figure 4: An illustration for Stochastic Soft Thinking, which incorporates random sampling techniques to construct a Stochastic Soft Token.
  • Figure 5: Softness vs. randomness for Stochastic Soft Tokens.
  • ...and 1 more figures