UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

Kou Misaki; Takuya Akiba

UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

Kou Misaki, Takuya Akiba

TL;DR

UnMaskFork (UMF) addresses the inefficiency of applying stochastic test-time scaling to Masked Diffusion Language Models by reframing unmasking as a deterministic, tree-based search. Using Monte Carlo Tree Search over discrete actions that select among multiple MDLMs and inference configurations, UMF achieves diverse, high-quality unmasking trajectories while aggressively caching deterministic rollouts to maximize compute efficiency under a fixed NFE budget. Empirically, UMF consistently surpasses Best-of-N and diffusion-tree baselines on coding benchmarks and also scales to mathematical reasoning tasks, illustrating the method's generality beyond code generation. The work highlights the value of deterministic, multi-model exploration and caching for non-autoregressive diffusion models, with implications for safety, energy use, and practical deployment where inference-time compute is a critical resource.

Abstract

Test-time scaling strategies have effectively leveraged inference-time compute to enhance the reasoning abilities of Autoregressive Large Language Models. In this work, we demonstrate that Masked Diffusion Language Models (MDLMs) are inherently amenable to advanced search strategies, owing to their iterative and non-autoregressive generation process. To leverage this, we propose UnMaskFork (UMF), a framework that formulates the unmasking trajectory as a search tree and employs Monte Carlo Tree Search to optimize the generation path. In contrast to standard scaling methods relying on stochastic sampling, UMF explores the search space through deterministic partial unmasking actions performed by multiple MDLMs. Our empirical evaluation demonstrates that UMF consistently outperforms existing test-time scaling baselines on complex coding benchmarks, while also exhibiting strong scalability on mathematical reasoning tasks.

UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

TL;DR

Abstract

Paper Structure (33 sections, 1 equation, 8 figures, 5 tables, 1 algorithm)

This paper contains 33 sections, 1 equation, 8 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Partially-masked state and mask ratio
MDLM prediction and unmask transition
Action as an inference configuration
Inference budget (NFE)
Remasking Strategies ($g_a$)
Methods
Motivation
UnMaskFork
Design of the Action Set
Motivation and Analysis
Inference as Adaptive Kernel Selection
Budget Efficiency: Deterministic vs. Stochastic Diversity
...and 18 more sections

Figures (8)

Figure 1: Conceptual diagram of UnMaskFork. Nodes generated during rollouts (dotted lines) and their evaluation results are cached to be reused in subsequent expansion steps, minimizing redundant computations.
Figure 2: Scaling plots (Pass@1) on LiveCodeBench, HumanEval+, and MBPP+.
Figure 3: Text generated by unmasking up to the starred node in Figure \ref{['fig:umf_tree']}. Darker colors indicate tokens unmasked earlier. Blue highlights tokens unmasked by Dream-Coder, and orange by LLaDA. The unmasking process proceeded in the order of the numbers shown in the legend.
Figure 4: Example of the UMF search tree for LiveCodeBench at NFE=12288. "D" denotes unmasking by Dream-Coder, and "L" denotes unmasking by LLaDA. The starred node represents the node used for submission, which is correct for this problem.
Figure 5: Scaling plots (Pass@1) of LLaDA baselines on LiveCodeBench, HumanEval+, and MBPP+.
...and 3 more figures

UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

TL;DR

Abstract

UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

Authors

TL;DR

Abstract

Table of Contents

Figures (8)