Table of Contents
Fetching ...

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo, Niklas Nolte, Brian Karrer

TL;DR

This work tackles the inefficiency of sampling in discrete Masked Diffusion Models by introducing EB-Sampler, an entropy-bounded, adaptive unmasking strategy that dynamically unmasks multiple tokens per evaluation. Grounded in an error decomposition separating model error from joint dependence error, EB-Sampler uses a thresholded entropy bound to decide both which tokens and how many to unmask, enabling efficient, near-autoregressive-like generation without retraining. Empirical results across code, math reasoning, and logic puzzles show 2-3x speedups on state-of-the-art MDMs (LLaDa 8B, Dream 7B) while preserving performance, with notable gains on MBPP, GSM8K, and MATH benchmarks and solid performance in maze navigation and Sudoku. The method offers a flexible, drop-in alternative to fixed Top-$k$ sampling and highlights a broader family of adaptive samplers with a principled error budget, suggesting avenues for learned, data-driven improvements and revisiting token histories for future speedups.

Abstract

Recent masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling. While most literature has focused on performance enhancing sampling procedures, efficient sampling from MDMs has been scarcely explored. We make the observation that often a given sequence of partially masked tokens determines the values of multiple unknown tokens deterministically, meaning that a single prediction of a masked model holds additional information unused by standard sampling procedures. Based on this observation, we introduce EB-Sampler, a simple drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure that dynamically unmasks multiple tokens in one function evaluation with predefined approximate error tolerance. We formulate the EB-Sampler as part of a broad family of adaptive samplers for which we provide an error analysis that motivates our algorithmic choices. EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance. We also validate the same procedure works well on smaller reasoning tasks including maze navigation and Sudoku, tasks ARMs often struggle with.

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

TL;DR

This work tackles the inefficiency of sampling in discrete Masked Diffusion Models by introducing EB-Sampler, an entropy-bounded, adaptive unmasking strategy that dynamically unmasks multiple tokens per evaluation. Grounded in an error decomposition separating model error from joint dependence error, EB-Sampler uses a thresholded entropy bound to decide both which tokens and how many to unmask, enabling efficient, near-autoregressive-like generation without retraining. Empirical results across code, math reasoning, and logic puzzles show 2-3x speedups on state-of-the-art MDMs (LLaDa 8B, Dream 7B) while preserving performance, with notable gains on MBPP, GSM8K, and MATH benchmarks and solid performance in maze navigation and Sudoku. The method offers a flexible, drop-in alternative to fixed Top- sampling and highlights a broader family of adaptive samplers with a principled error budget, suggesting avenues for learned, data-driven improvements and revisiting token histories for future speedups.

Abstract

Recent masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling. While most literature has focused on performance enhancing sampling procedures, efficient sampling from MDMs has been scarcely explored. We make the observation that often a given sequence of partially masked tokens determines the values of multiple unknown tokens deterministically, meaning that a single prediction of a masked model holds additional information unused by standard sampling procedures. Based on this observation, we introduce EB-Sampler, a simple drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure that dynamically unmasks multiple tokens in one function evaluation with predefined approximate error tolerance. We formulate the EB-Sampler as part of a broad family of adaptive samplers for which we provide an error analysis that motivates our algorithmic choices. EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance. We also validate the same procedure works well on smaller reasoning tasks including maze navigation and Sudoku, tasks ARMs often struggle with.

Paper Structure

This paper contains 33 sections, 18 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Illustration of an unmasking step with EB-Sampler. At each step EB sampler determines which tokens to unmask by ordering according to an error proxy and then chooses how many tokens to independently unmask by bounding their joint dependence via model prediction entropies.
  • Figure 2: Performance of greedy sampling with various unmasking criteria from LLaDa 8B Base model.
  • Figure 3: Efficiency-accuracy tradeoff of Top-$k$ (NFE) sampling on MBPP.
  • Figure 4: Python code implementation of a single sampling step for common Top-$k$ approaches and for EB-Sampler.
  • Figure 5: pass@1 accuracy vs. NFE with generate_until logic on code and math reasoning tasks.
  • ...and 7 more figures