Table of Contents
Fetching ...

Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

Jucheng Shen, Yeonju Ro

TL;DR

The paper addresses inefficiencies in diffusion-based language decoding caused by static thresholds. It introduces One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and generalizes to other inputs within the same task, leveraging stable task-level confidence signatures. Empirical results on GPQA, GSM8K, and HumanEval show meaningful throughput gains with comparable or modest accuracy impact, outperforming static-threshold baselines and improving the accuracy-throughput Pareto frontier. The work suggests a broader potential for reusable confidence profiles to drive more efficient diffusion decoding without additional training.

Abstract

Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.

Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

TL;DR

The paper addresses inefficiencies in diffusion-based language decoding caused by static thresholds. It introduces One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and generalizes to other inputs within the same task, leveraging stable task-level confidence signatures. Empirical results on GPQA, GSM8K, and HumanEval show meaningful throughput gains with comparable or modest accuracy impact, outperforming static-threshold baselines and improving the accuracy-throughput Pareto frontier. The work suggests a broader potential for reusable confidence profiles to drive more efficient diffusion decoding without additional training.

Abstract

Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.

Paper Structure

This paper contains 11 sections, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Step-block mean token confidence. Across GPQA, GSM8K, and HumanEval, confidence starts low, peaks mid-process, and drops near the final steps. These structured U-shaped dynamics highlight the limits of static thresholding.
  • Figure 2: Pairwise cosine similarity of step-block mean token confidence. Confidence trajectories are nearly identical across inputs of the same dataset, suggesting a single calibration run can generalize to the entire benchmark.
  • Figure 3: GPQA hyperparameter sweep. Accuracy peaks near 30% but varies only slightly across settings, while throughput is strongly influenced by $\epsilon$ and $\kappa$. Step-block mode provides finer adaptation and better trade-offs in high-accuracy regions.
  • Figure 4: GSM8K hyperparameter sweep. Structured reasoning tasks benefit most from block-level thresholds, which achieve higher accuracy (up to 76%) while maintaining strong throughput. Step-block offers little advantage here, confirming block mode suffices for GSM8K.
  • Figure 5: HumanEval hyperparameter sweep. Code generation shows a sharper accuracy–throughput trade-off: aggressive settings yield large speedups but accuracy drops quickly. Block-level thresholds dominate the Pareto frontier, offering simpler yet more efficient schedules.