Table of Contents
Fetching ...

Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning

Chenyang Shao, Sijian Ren, Fengli Xu, Yong Li

TL;DR

This work addresses the computational bottleneck of autoregressive reasoning in large language models by introducing Diffuse Thinking (DT), a collaborative framework that uses diffusion language models (DLMs) to generate diverse reasoning proposals in parallel and a large language model (LLM) to evaluate and select the best proposal. The authors provide a formal treatment of DLMs and LLMs, analyze computational and parallel-time complexities, and present a principled interaction design that decouples proposal generation from evaluation. Empirical results across four logical and mathematical benchmarks show that DT improves both accuracy and throughput, with additional gains from learning-to-propose-thoughts via targeted fine-tuning. The work demonstrates that leveraging parallel diffusion-based proposal generation, coupled with semantic evaluation by an evaluator, can yield efficient and scalable reasoning, paving the way for practical, high-performance AI reasoning systems. The open-source release further enables reproducibility and broader adoption of the approach.

Abstract

In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic evaluation and exploration of a diverse spectrum of intermediate thoughts, LLMs demonstrate the potential to generate deliberate reasoning steps, thereby substantially enhancing reasoning accuracy. However, LLMs' autoregressive generation paradigm results in reasoning performance scaling sub-optimally with test-time computation, often requiring excessive computational overhead to propose thoughts while yielding only marginal performance gains. In contrast, diffusion language models (DLMs) can efficiently produce diverse samples through parallel denoising in a single forward pass, inspiring us to leverage them for proposing intermediate thoughts, thereby alleviating the computational burden associated with autoregressive generation while maintaining quality. In this work, we propose an efficient collaborative reasoning framework, leveraging DLMs to generate candidate thoughts and LLMs to evaluate their quality. Experiments across diverse benchmarks demonstrate that our framework achieves strong performance in complex reasoning tasks, offering a promising direction for future research. Our code is open-source at https://anonymous.4open.science/r/Diffuse-Thinking-EC60.

Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning

TL;DR

This work addresses the computational bottleneck of autoregressive reasoning in large language models by introducing Diffuse Thinking (DT), a collaborative framework that uses diffusion language models (DLMs) to generate diverse reasoning proposals in parallel and a large language model (LLM) to evaluate and select the best proposal. The authors provide a formal treatment of DLMs and LLMs, analyze computational and parallel-time complexities, and present a principled interaction design that decouples proposal generation from evaluation. Empirical results across four logical and mathematical benchmarks show that DT improves both accuracy and throughput, with additional gains from learning-to-propose-thoughts via targeted fine-tuning. The work demonstrates that leveraging parallel diffusion-based proposal generation, coupled with semantic evaluation by an evaluator, can yield efficient and scalable reasoning, paving the way for practical, high-performance AI reasoning systems. The open-source release further enables reproducibility and broader adoption of the approach.

Abstract

In recent years, large language models (LLMs) have witnessed remarkable advancements, with the test-time scaling law consistently enhancing the reasoning capabilities. Through systematic evaluation and exploration of a diverse spectrum of intermediate thoughts, LLMs demonstrate the potential to generate deliberate reasoning steps, thereby substantially enhancing reasoning accuracy. However, LLMs' autoregressive generation paradigm results in reasoning performance scaling sub-optimally with test-time computation, often requiring excessive computational overhead to propose thoughts while yielding only marginal performance gains. In contrast, diffusion language models (DLMs) can efficiently produce diverse samples through parallel denoising in a single forward pass, inspiring us to leverage them for proposing intermediate thoughts, thereby alleviating the computational burden associated with autoregressive generation while maintaining quality. In this work, we propose an efficient collaborative reasoning framework, leveraging DLMs to generate candidate thoughts and LLMs to evaluate their quality. Experiments across diverse benchmarks demonstrate that our framework achieves strong performance in complex reasoning tasks, offering a promising direction for future research. Our code is open-source at https://anonymous.4open.science/r/Diffuse-Thinking-EC60.

Paper Structure

This paper contains 50 sections, 57 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparison of generation in DLM and LLM: DLMs generate in parallel, producing multiple tokens simultaneously, while LLMs generate sequentially, one token at a time.
  • Figure 2: Our proposed framework. The DLM efficiently generates multiple reasoning thoughts in parallel, exemplified by the Game of 24. These thoughts are then evaluated and selected by the LLM, which selects the most promising proposal.
  • Figure 3: Proposal accuracy scaling with varying proposal quantities (1--10) across different benchmarks.
  • Figure 4: Performance improvements on different benchmarks after fine-tuning.

Theorems & Definitions (1)

  • Definition 1: Per-Step information loss in diffusion