Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models
Joshua Ong Jun Leang, Yu Zhao, Mihaela Cătălina Stoian, Wenda Li, Shay B. Cohen, Eleonora Giunchiglia
TL;DR
The paper addresses the sensitivity of diffusion-language-models to slot infilling order by introducing McDiffuSE, a training-free framework that uses Monte Carlo Tree Search to plan slot orderings with lookahead and rollout-based value estimation. By integrating model confidences as priors and balancing immediate slot quality with long-term trajectory coherence, McDiffuSE achieves notable gains over autoregressive and plan-and-infill baselines, including $3.2\%$ average improvement over ARMs and $8.0\%$ over plan-and-infill, with strong results on MBPP ($19.5\%$) and MATH500 ($4.9\%$). The work reveals that while most decisions are near-sequential, a subset of non-sequential orderings crucially improves performance, and that exploration breadth (not merely simulation depth) is essential to overcome priors and discover effective orders. These findings establish MCTS-based slot planning as an effective strategy for enhancing generation quality in Masked Diffusion Models, particularly in coding and structured reasoning tasks. The approach also reduces token usage while maintaining or improving accuracy, indicating practical benefits for efficient, coherent long-form generation.
Abstract
While plan-and-infill decoding in Masked Diffusion Models (MDMs) shows promise for mathematical and code reasoning, performance remains highly sensitive to slot infilling order, often yielding substantial output variance. We introduce McDiffuSE, a framework that formulates slot selection as decision making and optimises infilling orders through Monte Carlo Tree Search (MCTS). McDiffuSE uses look-ahead simulations to evaluate partial completions before commitment, systematically exploring the combinatorial space of generation orders. Experiments show an average improvement of 3.2% over autoregressive baselines and 8.0% over baseline plan-and-infill, with notable gains of 19.5% on MBPP and 4.9% on MATH500. Our analysis reveals that while McDiffuSE predominantly follows sequential ordering, incorporating non-sequential generation is essential for maximising performance. We observe that larger exploration constants, rather than increased simulations, are necessary to overcome model confidence biases and discover effective orderings. These findings establish MCTS-based planning as an effective approach for enhancing generation quality in MDMs.
