Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling
Aihua Zhu, Rui Su, Qinglin Zhao, Li Feng, Meng Shen, Shibo He
TL;DR
Diffusion models suffer from slow sampling, motivating training-free schedule optimization. The authors introduce Hierarchical-Schedule-Optimizer (HSO), a bi-level framework that combines a low-dimensional global search for initialization with a fast local schedule refinement, guided by the Midpoint Error Proxy (MEP) and the Spacing-Penalized Fitness (SPF). Empirical results show state-of-the-art performance in the extremely low-NFE regime (e.g., FID 11.94 at NFE=5 on LAION-Aesthetics) with under 8 seconds of optimization on CPU, and strong generalization across solvers and models. The work presents a practical and efficient strategy for diffusion model acceleration, with robust adaptivity and empirical validation including ablations and robustness tests.
Abstract
Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this process is Schedule Optimization, which aims to find an optimal distribution of timesteps for a fixed and small Number of Function Evaluations (NFE) to maximize sample quality. To this end, a successful schedule optimization method must adhere to four core principles: effectiveness, adaptivity, practical robustness, and computational efficiency. However, existing paradigms struggle to satisfy these principles simultaneously, motivating the need for a more advanced solution. To overcome these limitations, we propose the Hierarchical-Schedule-Optimizer (HSO), a novel and efficient bi-level optimization framework. HSO reframes the search for a globally optimal schedule into a more tractable problem by iteratively alternating between two synergistic levels: an upper-level global search for an optimal initialization strategy and a lower-level local optimization for schedule refinement. This process is guided by two key innovations: the Midpoint Error Proxy (MEP), a solver-agnostic and numerically stable objective for effective local optimization, and the Spacing-Penalized Fitness (SPF) function, which ensures practical robustness by penalizing pathologically close timesteps. Extensive experiments show that HSO sets a new state-of-the-art for training-free sampling in the extremely low-NFE regime. For instance, with an NFE of just 5, HSO achieves a remarkable FID of 11.94 on LAION-Aesthetics with Stable Diffusion v2.1. Crucially, this level of performance is attained not through costly retraining, but with a one-time optimization cost of less than 8 seconds, presenting a highly practical and efficient paradigm for diffusion model acceleration.
