Table of Contents
Fetching ...

Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models

Mingyu Cao, Alvaro Correia, Christos Louizos, Shiwei Liu, Lu Yin

TL;DR

This work tackles the decoding bottleneck in diffusion language models by rethinking the unmasking order. It introduces Position Beam Search (PBS) to explore alternative unmasking sequences and SOAR, an adaptive, training-free decoder that switches between PBS and parallel decoding based on per-step confidence to balance quality and speed. Empirical results on GSM8K, MBPP, and HumanEval with Dream-7B and LLaDA-8B demonstrate consistent accuracy improvements, with Dream-7B-Base achieving up to several percentage-point gains on reasoning tasks, while maintaining practical inference speed. The approach is robust to different unmask metrics, sequence lengths, and variable-length decoding, offering a versatile, plug-in decoding strategy for diffusion language models.

Abstract

Diffusion Language Models (DLMs) generate text by iteratively denoising a masked sequence, repeatedly deciding which positions to commit at each step. Standard decoding follows a greedy rule: unmask the most confident positions, yet this local choice can lock the model into a suboptimal unmasking order, especially on reasoning-heavy prompts. We present SOAR, a training-free decoding algorithm that adapts its behavior to the model's uncertainty. When confidence is low, SOAR briefly widens the search over alternative unmasking decisions to avoid premature commitments; when confidence is high, it collapses the search and decodes many positions in parallel to reduce the number of denoising iterations. Across mathematical reasoning and code generation benchmarks (GSM8K, MBPP, HumanEval) on Dream-7B and LLaDA-8B, SOAR improves generation quality while maintaining competitive inference speed, offering a practical way to balance quality and efficiency in DLM decoding.

Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models

TL;DR

This work tackles the decoding bottleneck in diffusion language models by rethinking the unmasking order. It introduces Position Beam Search (PBS) to explore alternative unmasking sequences and SOAR, an adaptive, training-free decoder that switches between PBS and parallel decoding based on per-step confidence to balance quality and speed. Empirical results on GSM8K, MBPP, and HumanEval with Dream-7B and LLaDA-8B demonstrate consistent accuracy improvements, with Dream-7B-Base achieving up to several percentage-point gains on reasoning tasks, while maintaining practical inference speed. The approach is robust to different unmask metrics, sequence lengths, and variable-length decoding, offering a versatile, plug-in decoding strategy for diffusion language models.

Abstract

Diffusion Language Models (DLMs) generate text by iteratively denoising a masked sequence, repeatedly deciding which positions to commit at each step. Standard decoding follows a greedy rule: unmask the most confident positions, yet this local choice can lock the model into a suboptimal unmasking order, especially on reasoning-heavy prompts. We present SOAR, a training-free decoding algorithm that adapts its behavior to the model's uncertainty. When confidence is low, SOAR briefly widens the search over alternative unmasking decisions to avoid premature commitments; when confidence is high, it collapses the search and decodes many positions in parallel to reduce the number of denoising iterations. Across mathematical reasoning and code generation benchmarks (GSM8K, MBPP, HumanEval) on Dream-7B and LLaDA-8B, SOAR improves generation quality while maintaining competitive inference speed, offering a practical way to balance quality and efficiency in DLM decoding.
Paper Structure (21 sections, 15 equations, 8 figures, 3 tables)

This paper contains 21 sections, 15 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Accuracy and average confidence on GSM8K with Dream-7B-Base. All questions are divided into 6 bins by their average decoding confidence, with $n$ indicating the sample size in each bin. The red solid line represents the trend line.
  • Figure 2: Overview of SOAR. When t=0, there are two sequences in the beam. Based on the confidence of decoded tokens, one undergoes parallel decoding (green arrow), while the other undergoes position search (blue arrow). After reordering, since the sequence with the highest average confidence is obtained through parallel decoding, the beam size is reduced to 1.
  • Figure 3: Pareto frontier on Dream-7B-Base. Solid arrows indicate adding parallel decoding (acceleration), and dashed arrows indicate adding position beam search. This plot is generated using the average accuracy and average speedup from Table \ref{['tab:main_results']}. For better visual presentation, the Y-axis range 49–58 is compressed.
  • Figure 4: The threshhold study on Dream-7B: (a) Cumulative distribution of token confidence scores on GSM8K; (b) Trade-off between accuracy and SpeedUp under varying confidence thresholds on MBPP; (c) Trade-off between accuracy and SpeedUp under varying confidence thresholds on GSM8K.
  • Figure 5: The beam size study on Dream-7B: Trade-off between accuracy and SpeedUp under increasing beam size on HumanEval.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 4.1
  • Definition 4.2