Learn from Your Mistakes: Self-Correcting Masked Diffusion Models
Yair Schiff, Omer Belhasin, Roy Uziel, Guanghan Wang, Marianne Arriola, Gilad Turok, Michael Elad, Volodymyr Kuleshov
TL;DR
ProSeCo introduces Progressive Self-Correction to overcome error accumulation in Masked Diffusion Models by jointly training a single model to both unmask and correct its own outputs. The method interleaves corrective refinement steps with standard unmasking during generation, enabling iterative, whole-sequence refinement and enabling improved quality at higher parallelization. Empirical results across math, coding, molecules, and unconditional text demonstrate faster sampling (2–3x) with maintained or improved accuracy (up to ~1.3x on benchmarks) and favorable quality-efficiency trade-offs, with scalable inference-time compute to further boost performance. The work also shows enhanced guided sampling and maintains output diversity in unconditional generation, presenting a practical approach to extend the capabilities of discrete diffusion models while offering clear guidelines for budgeted correction during inference.
Abstract
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction. By reusing outputs from the MDM denoising network as inputs for corrector training, we train a model to recover from potential mistakes. During generation we apply additional corrective refinement steps between unmasking ones in order to change decoded tokens and improve outputs. We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence, including already generated tokens. We conduct extensive experimental validation across multiple conditional and unconditional tasks, demonstrating that ProSeCo yields better quality-efficiency trade-offs (up to ~2-3x faster sampling) and enables inference-time compute scaling to further increase sample quality beyond standard MDMs (up to ~1.3x improvement on benchmarks).
