The Cosine Schedule is Fisher-Rao-Optimal for Masked Discrete Diffusion Models
Leo Zhang, Saifuddin Syed
TL;DR
The paper tackles the problem of selecting discretisation schedules for sampling from masked discrete diffusion models through an information-geometric lens. By computing the Fisher-Rao metric along the forward probability path, the authors show that the optimal schedule corresponds to a geodesic, and derive a closed-form cosine-based schedule that reduces to the cosine schedule when $\alpha_1=0$. This provides a theoretical justification for using cosine-like step spacings in this discrete diffusion setting and connects sampling efficiency to Riemannian geometry. The work also lays groundwork for exploring alternative metrics and their impact on discretisation strategies, while acknowledging limitations related to true-path versus approximate-sampling dynamics. Future work includes empirical validation and extending the framework to other information-geometric notions of distance.
Abstract
In this work, we study the problem of choosing the discretisation schedule for sampling from masked discrete diffusion models in terms of the information geometry of the induced probability path. Specifically, we show that the optimal schedule under the Fisher-Rao geometry recovers the popularly-used cosine schedule.
