Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration
Yunghee Lee, Byeonghyun Pak, Junwha Hong, Hoseong Kim
TL;DR
Tortoise and Hare Guidance (THG) introduces a training-free, multirate formulation for diffusion model inference that exploits asymmetry between the noise-estimate term and the CFG guidance term. By integrating the noise estimate on a fine-grained grid (tortoise) and the guidance term on a coarse grid (hare), THG reduces the number of function evaluations by up to $\sim 30\%$ with negligible loss in fidelity, across multiple backbones such as Stable Diffusion 1.5, 3.5 Large, and AudioLDM 2. An error-bound–aware timestep sampler and a guidance-scale scheduler stabilize extrapolations, enabling real-time high-quality synthesis without retraining. The approach is model-agnostic and backed by a theoretical error-analysis proving safe coarse-graining of the guidance term, offering a principled path toward faster diffusion-based generation in practical settings.
Abstract
In this paper, we propose Tortoise and Hare Guidance (THG), a training-free strategy that accelerates diffusion sampling while maintaining high-fidelity generation. We demonstrate that the noise estimate and the additional guidance term exhibit markedly different sensitivity to numerical error by reformulating the classifier-free guidance (CFG) ODE as a multirate system of ODEs. Our error-bound analysis shows that the additional guidance branch is more robust to approximation, revealing substantial redundancy that conventional solvers fail to exploit. Building on this insight, THG significantly reduces the computation of the additional guidance: the noise estimate is integrated with the tortoise equation on the original, fine-grained timestep grid, while the additional guidance is integrated with the hare equation only on a coarse grid. We also introduce (i) an error-bound-aware timestep sampler that adaptively selects step sizes and (ii) a guidance-scale scheduler that stabilizes large extrapolation spans. THG reduces the number of function evaluations (NFE) by up to 30% with virtually no loss in generation fidelity ($Δ$ImageReward $\leq$ 0.032) and outperforms state-of-the-art CFG-based training-free accelerators under identical computation budgets. Our findings highlight the potential of multirate formulations for diffusion solvers, paving the way for real-time high-quality image synthesis without any model retraining. The source code is available at https://github.com/yhlee-add/THG.
