Table of Contents
Fetching ...

Guided Star-Shaped Masked Diffusion

Viacheslav Meshchaninov, Egor Shibaev, Artem Makoian, Ivan Klimov, Danil Sheshenya, Andrei Malinin, Nikita Balagansky, Daniil Gavrilov, Aibek Alanov, Dmitry Vetrov

TL;DR

Guided Star-Shaped Masked Diffusion introduces a star-shaped forward process that enables token revision in discrete diffusion models and pairs it with a lightweight, error-targeted predictor to selectively remask likely erroneous tokens. By predicting a full clean hypothesis hat{x}_0 and refining via targeted remasking, the method achieves substantial quality gains in few-step generation while remaining compatible with pre-trained masked diffusion language models. The approach is validated across text and code generation, showing strong performance improvements over traditional MDLM and ReMDM baselines, including improvements on large-scale instruction-tuned models. Practical implications include improved efficiency for constrained-generation scenarios and demonstrated applicability to both natural language and programming tasks, with reproducibility supported by released code.

Abstract

The performance of pre-trained masked diffusion models is often constrained by their sampling procedure, which makes decisions irreversible and struggles in low-step generation regimes. We introduce a novel sampling algorithm that works with pre-trained models and, after a lightweight fine-tuning of a single layer, significantly improves sample quality and efficiency. Our method reformulates the generation process using a star-shaped paradigm, which inherently allows for error correction. To make this process effective, we augment it with a learnable re-masking scheduler that intelligently identifies and revises likely errors. This approach yields a substantial quality boost, particularly when using a small number of sampling steps. We extensively ablate key components of our approach and show its usability in different scenarios. In comprehensive experiments on text, and code generation, our sampling algorithm outperforms or matches existing methods.

Guided Star-Shaped Masked Diffusion

TL;DR

Guided Star-Shaped Masked Diffusion introduces a star-shaped forward process that enables token revision in discrete diffusion models and pairs it with a lightweight, error-targeted predictor to selectively remask likely erroneous tokens. By predicting a full clean hypothesis hat{x}_0 and refining via targeted remasking, the method achieves substantial quality gains in few-step generation while remaining compatible with pre-trained masked diffusion language models. The approach is validated across text and code generation, showing strong performance improvements over traditional MDLM and ReMDM baselines, including improvements on large-scale instruction-tuned models. Practical implications include improved efficiency for constrained-generation scenarios and demonstrated applicability to both natural language and programming tasks, with reproducibility supported by released code.

Abstract

The performance of pre-trained masked diffusion models is often constrained by their sampling procedure, which makes decisions irreversible and struggles in low-step generation regimes. We introduce a novel sampling algorithm that works with pre-trained models and, after a lightweight fine-tuning of a single layer, significantly improves sample quality and efficiency. Our method reformulates the generation process using a star-shaped paradigm, which inherently allows for error correction. To make this process effective, we augment it with a learnable re-masking scheduler that intelligently identifies and revises likely errors. This approach yields a substantial quality boost, particularly when using a small number of sampling steps. We extensively ablate key components of our approach and show its usability in different scenarios. In comprehensive experiments on text, and code generation, our sampling algorithm outperforms or matches existing methods.

Paper Structure

This paper contains 41 sections, 14 equations, 10 figures, 4 tables, 2 algorithms.

Figures (10)

  • Figure 1: Analysis of the star-shaped (Star) sampler's dynamics. (Left) Perplexity and (Right) step-to-step similarity over the generation trajectory for three configurations: MDLM, Star, and our hybrid approach (Star+), which switches from MDLM to Star at step 90 (dotted line).
  • Figure 2: The impact of the star-shaped sampler's activation time ($t_{on}$) on generation quality. We plot the final MAUVE score for a hybrid sampler that switches from MDLM to Star at time $t_{on}$.
  • Figure 3: Performance comparison in few-step generation regimes. Guided sampler (G-Star+) consistently outperforms the unguided Star+.
  • Figure 4: Performance of ReMDM as a function of the hyperparameter $\eta$. Results are shown for three different remasking schedules. The dashed line indicates the performance of the baseline MDLM sampler. The plots reveal a high sensitivity to the choice of $\eta$, with suboptimal values often performing worse than the baseline.
  • Figure 5: Performance as a function of the refinement loop size. Increasing the refinement budget generally improves quality (lower PPL, higher MAUVE) but reduces diversity. Our guided G-Star-loop demonstrates a much steeper rate of improvement, achieving higher quality with fewer steps. The MAUVE score eventually peaks and declines as the loss of diversity outweighs quality gains.
  • ...and 5 more figures