Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models
Jucheng Shen, Yeonju Ro
TL;DR
The paper addresses inefficiencies in diffusion-based language decoding caused by static thresholds. It introduces One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and generalizes to other inputs within the same task, leveraging stable task-level confidence signatures. Empirical results on GPQA, GSM8K, and HumanEval show meaningful throughput gains with comparable or modest accuracy impact, outperforming static-threshold baselines and improving the accuracy-throughput Pareto frontier. The work suggests a broader potential for reusable confidence profiles to drive more efficient diffusion decoding without additional training.
Abstract
Masked diffusion language models (MDLMs) are becoming competitive with their autoregressive counterparts but typically decode with fixed steps and sequential unmasking. To accelerate decoding, recent work such as Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block- and step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs as measured by cosine similarity. Motivated by these observations, we introduce One-Shot Dynamic Thresholding (OSDT), which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy-throughput trade-offs (+24% tokens/s on GSM8K at the best accuracy, +45% on GPQA with comparable accuracy, and +50% on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.
