Table of Contents
Fetching ...

Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

Joshua Ong Jun Leang, Yu Zhao, Mihaela Cătălina Stoian, Wenda Li, Shay B. Cohen, Eleonora Giunchiglia

TL;DR

The paper addresses the sensitivity of diffusion-language-models to slot infilling order by introducing McDiffuSE, a training-free framework that uses Monte Carlo Tree Search to plan slot orderings with lookahead and rollout-based value estimation. By integrating model confidences as priors and balancing immediate slot quality with long-term trajectory coherence, McDiffuSE achieves notable gains over autoregressive and plan-and-infill baselines, including $3.2\%$ average improvement over ARMs and $8.0\%$ over plan-and-infill, with strong results on MBPP ($19.5\%$) and MATH500 ($4.9\%$). The work reveals that while most decisions are near-sequential, a subset of non-sequential orderings crucially improves performance, and that exploration breadth (not merely simulation depth) is essential to overcome priors and discover effective orders. These findings establish MCTS-based slot planning as an effective strategy for enhancing generation quality in Masked Diffusion Models, particularly in coding and structured reasoning tasks. The approach also reduces token usage while maintaining or improving accuracy, indicating practical benefits for efficient, coherent long-form generation.

Abstract

While plan-and-infill decoding in Masked Diffusion Models (MDMs) shows promise for mathematical and code reasoning, performance remains highly sensitive to slot infilling order, often yielding substantial output variance. We introduce McDiffuSE, a framework that formulates slot selection as decision making and optimises infilling orders through Monte Carlo Tree Search (MCTS). McDiffuSE uses look-ahead simulations to evaluate partial completions before commitment, systematically exploring the combinatorial space of generation orders. Experiments show an average improvement of 3.2% over autoregressive baselines and 8.0% over baseline plan-and-infill, with notable gains of 19.5% on MBPP and 4.9% on MATH500. Our analysis reveals that while McDiffuSE predominantly follows sequential ordering, incorporating non-sequential generation is essential for maximising performance. We observe that larger exploration constants, rather than increased simulations, are necessary to overcome model confidence biases and discover effective orderings. These findings establish MCTS-based planning as an effective approach for enhancing generation quality in MDMs.

Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

TL;DR

The paper addresses the sensitivity of diffusion-language-models to slot infilling order by introducing McDiffuSE, a training-free framework that uses Monte Carlo Tree Search to plan slot orderings with lookahead and rollout-based value estimation. By integrating model confidences as priors and balancing immediate slot quality with long-term trajectory coherence, McDiffuSE achieves notable gains over autoregressive and plan-and-infill baselines, including average improvement over ARMs and over plan-and-infill, with strong results on MBPP () and MATH500 (). The work reveals that while most decisions are near-sequential, a subset of non-sequential orderings crucially improves performance, and that exploration breadth (not merely simulation depth) is essential to overcome priors and discover effective orders. These findings establish MCTS-based slot planning as an effective strategy for enhancing generation quality in Masked Diffusion Models, particularly in coding and structured reasoning tasks. The approach also reduces token usage while maintaining or improving accuracy, indicating practical benefits for efficient, coherent long-form generation.

Abstract

While plan-and-infill decoding in Masked Diffusion Models (MDMs) shows promise for mathematical and code reasoning, performance remains highly sensitive to slot infilling order, often yielding substantial output variance. We introduce McDiffuSE, a framework that formulates slot selection as decision making and optimises infilling orders through Monte Carlo Tree Search (MCTS). McDiffuSE uses look-ahead simulations to evaluate partial completions before commitment, systematically exploring the combinatorial space of generation orders. Experiments show an average improvement of 3.2% over autoregressive baselines and 8.0% over baseline plan-and-infill, with notable gains of 19.5% on MBPP and 4.9% on MATH500. Our analysis reveals that while McDiffuSE predominantly follows sequential ordering, incorporating non-sequential generation is essential for maximising performance. We observe that larger exploration constants, rather than increased simulations, are necessary to overcome model confidence biases and discover effective orderings. These findings establish MCTS-based planning as an effective approach for enhancing generation quality in MDMs.
Paper Structure (44 sections, 38 equations, 8 figures, 6 tables, 5 algorithms)

This paper contains 44 sections, 38 equations, 8 figures, 6 tables, 5 algorithms.

Figures (8)

  • Figure 1: Overview of McDiffuSE. We formulate slot selection as a sequential decision-making process optimised via Monte Carlo Tree Search. As illustrated in the Statistics box, the model's greedy prior ($P(a=1\mid s_0)=0.37$) favours immediately generating the function definition (i.e., slot $2$: "def get_max_length(words):"). However, through look-ahead simulations, the search algorithm discovers that starting with the syntax declaration (i.e., slot $1$: ""'python") yields a higher long-term Q-value (i.e., $Q(s_0, a=1)=1.20$ for slot $1$ vs. $Q(s_0, a=2)=0.88$ for slot $2$), allowing the model to override the biased local prior and ensure global coherence.
  • Figure 2: Relationship between generation sequentiality and accuracy on the MBPP dataset. Each dot represents a sample plotted by its accuracy and sequentiality rate. Darker dots denote higher density, reflecting multiple instances with identical sequentiality. Solid lines denote average accuracy trends computed by binning sequentiality rates for ReFusion and McDiffuSE, while the dashed line indicates the overall accuracy of sequential (left-to-right) baseline.
  • Figure 3: Impact of exploration constant ($c$) and simulation budget ($N_{sim}$) on task performance.
  • Figure 4: Comparison of ReFusion and McDiffuSE on a coding prompt from MBPP. Superscripts denote the infilling slot order and colours indicate the specific generation step.
  • Figure 5: Comparison of token reduction across models. We observe that McDiffuSE significantly reduces tokens while generating responses, demonstrating the compactness and coherence of using McDiffuSE.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Example 3.1
  • Example 3.2
  • Example 3.3
  • Example 3.4
  • Example 3.5