Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

Sanghyun Lee; Seungryong Kim; Jongho Park; Dongmin Park

Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

Sanghyun Lee, Seungryong Kim, Jongho Park, Dongmin Park

TL;DR

This paper tackles the problem that decoding in Diffusion Language Models (DLMs) is highly sensitive to the unmasking order, and greedy, locally focused strategies can yield irrecoverable errors. It introduces LookUM, an unsupervised, path-based framework that couples a path generator with an uncertainty-based verifier to perform lookahead unmasking, guided by sequence-level certainty rather than local confidences. Across six benchmarks for mathematics, coding, and planning, LookUM yields consistent gains with only 2–4 additional inference paths, and it provides complementary benefits to RL-tuned training without requiring external reward models. The approach demonstrates robustness across base and RL-tuned LLaDA models, offers scalable compute, and establishes uncertainty-driven path selection as a practical, general mechanism for improving diffusion language models. This work thus broadens inference-time optimization for discrete diffusion and suggests promising future directions into leveraging more intrinsic signals for verification.

Abstract

Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference time order of unmasking. Prevailing heuristics, such as confidence based sampling, are myopic: they optimize locally, fail to leverage extra test-time compute, and let early decoding mistakes cascade. We propose Lookahead Unmasking (LookUM), which addresses these concerns by reformulating sampling as path selection over all possible unmasking orders without the need for an external reward model. Our framework couples (i) a path generator that proposes paths by sampling from pools of unmasking sets with (ii) a verifier that computes the uncertainty of the proposed paths and performs importance sampling to subsequently select the final paths. Empirically, erroneous unmasking measurably inflates sequence level uncertainty, and our method exploits this to avoid error-prone trajectories. We validate our framework across six benchmarks, such as mathematics, planning, and coding, and demonstrate consistent performance improvements. LookUM requires only two to three paths to achieve peak performance, demonstrating remarkably efficient path selection. The consistent improvements on both LLaDA and post-trained LLaDA 1.5 are particularly striking: base LLaDA with LookUM rivals the performance of RL-tuned LLaDA 1.5, while LookUM further enhances LLaDA 1.5 itself showing that uncertainty based verification provides orthogonal benefits to reinforcement learning and underscoring the versatility of our framework. Code will be publicly released.

Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

TL;DR

Abstract

Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)