Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking
Heli Ben-Hamu, Itai Gat, Daniel Severo, Niklas Nolte, Brian Karrer
TL;DR
This work tackles the inefficiency of sampling in discrete Masked Diffusion Models by introducing EB-Sampler, an entropy-bounded, adaptive unmasking strategy that dynamically unmasks multiple tokens per evaluation. Grounded in an error decomposition separating model error from joint dependence error, EB-Sampler uses a thresholded entropy bound to decide both which tokens and how many to unmask, enabling efficient, near-autoregressive-like generation without retraining. Empirical results across code, math reasoning, and logic puzzles show 2-3x speedups on state-of-the-art MDMs (LLaDa 8B, Dream 7B) while preserving performance, with notable gains on MBPP, GSM8K, and MATH benchmarks and solid performance in maze navigation and Sudoku. The method offers a flexible, drop-in alternative to fixed Top-$k$ sampling and highlights a broader family of adaptive samplers with a principled error budget, suggesting avenues for learned, data-driven improvements and revisiting token histories for future speedups.
Abstract
Recent masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling. While most literature has focused on performance enhancing sampling procedures, efficient sampling from MDMs has been scarcely explored. We make the observation that often a given sequence of partially masked tokens determines the values of multiple unknown tokens deterministically, meaning that a single prediction of a masked model holds additional information unused by standard sampling procedures. Based on this observation, we introduce EB-Sampler, a simple drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure that dynamically unmasks multiple tokens in one function evaluation with predefined approximate error tolerance. We formulate the EB-Sampler as part of a broad family of adaptive samplers for which we provide an error analysis that motivates our algorithmic choices. EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance. We also validate the same procedure works well on smaller reasoning tasks including maze navigation and Sudoku, tasks ARMs often struggle with.
