Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion
Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji
TL;DR
The paper addresses the inefficiency of sampling in discrete masked diffusion, specifically scrutinizing the MaskGIT sampler. It reveals that MaskGIT inherently performs temperature sampling and proposes the moment sampler as an asymptotically equivalent choose-then-sample alternative, enabling clearer interpretation. Two practical improvements—partial caching for transformers and a hybrid exploration-exploitation strategy for adaptive unmasking—are introduced to enhance CTS-based samplers. The authors validate their theory in image and language tasks, showing that the moment sampler closely mirrors MaskGIT in performance and that the hybrid approach yields meaningful speedups and improved trade-offs. Overall, this work advances both theoretical understanding and practical efficiency of masked diffusion samplers across modalities.
Abstract
Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains, but accelerating their sampling process remains relatively underexplored. To investigate efficient samplers for masked diffusion, this paper theoretically analyzes the MaskGIT sampler for image modeling, revealing its implicit temperature sampling mechanism. Through this analysis, we introduce the "moment sampler," an asymptotically equivalent but more tractable and interpretable alternative to MaskGIT, which employs a "choose-then-sample" approach by selecting unmasking positions before sampling tokens. In addition, we improve the efficiency of choose-then-sample algorithms through two key innovations: a partial caching technique for transformers that approximates longer sampling trajectories without proportional computational cost, and a hybrid approach formalizing the exploration-exploitation trade-off in adaptive unmasking. Experiments in image and text domains demonstrate our theory as well as the efficiency of our proposed methods, advancing both theoretical understanding and practical implementation of masked diffusion samplers.
