Table of Contents
Fetching ...

Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration

Yunghee Lee, Byeonghyun Pak, Junwha Hong, Hoseong Kim

TL;DR

Tortoise and Hare Guidance (THG) introduces a training-free, multirate formulation for diffusion model inference that exploits asymmetry between the noise-estimate term and the CFG guidance term. By integrating the noise estimate on a fine-grained grid (tortoise) and the guidance term on a coarse grid (hare), THG reduces the number of function evaluations by up to $\sim 30\%$ with negligible loss in fidelity, across multiple backbones such as Stable Diffusion 1.5, 3.5 Large, and AudioLDM 2. An error-bound–aware timestep sampler and a guidance-scale scheduler stabilize extrapolations, enabling real-time high-quality synthesis without retraining. The approach is model-agnostic and backed by a theoretical error-analysis proving safe coarse-graining of the guidance term, offering a principled path toward faster diffusion-based generation in practical settings.

Abstract

In this paper, we propose Tortoise and Hare Guidance (THG), a training-free strategy that accelerates diffusion sampling while maintaining high-fidelity generation. We demonstrate that the noise estimate and the additional guidance term exhibit markedly different sensitivity to numerical error by reformulating the classifier-free guidance (CFG) ODE as a multirate system of ODEs. Our error-bound analysis shows that the additional guidance branch is more robust to approximation, revealing substantial redundancy that conventional solvers fail to exploit. Building on this insight, THG significantly reduces the computation of the additional guidance: the noise estimate is integrated with the tortoise equation on the original, fine-grained timestep grid, while the additional guidance is integrated with the hare equation only on a coarse grid. We also introduce (i) an error-bound-aware timestep sampler that adaptively selects step sizes and (ii) a guidance-scale scheduler that stabilizes large extrapolation spans. THG reduces the number of function evaluations (NFE) by up to 30% with virtually no loss in generation fidelity ($Δ$ImageReward $\leq$ 0.032) and outperforms state-of-the-art CFG-based training-free accelerators under identical computation budgets. Our findings highlight the potential of multirate formulations for diffusion solvers, paving the way for real-time high-quality image synthesis without any model retraining. The source code is available at https://github.com/yhlee-add/THG.

Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration

TL;DR

Tortoise and Hare Guidance (THG) introduces a training-free, multirate formulation for diffusion model inference that exploits asymmetry between the noise-estimate term and the CFG guidance term. By integrating the noise estimate on a fine-grained grid (tortoise) and the guidance term on a coarse grid (hare), THG reduces the number of function evaluations by up to with negligible loss in fidelity, across multiple backbones such as Stable Diffusion 1.5, 3.5 Large, and AudioLDM 2. An error-bound–aware timestep sampler and a guidance-scale scheduler stabilize extrapolations, enabling real-time high-quality synthesis without retraining. The approach is model-agnostic and backed by a theoretical error-analysis proving safe coarse-graining of the guidance term, offering a principled path toward faster diffusion-based generation in practical settings.

Abstract

In this paper, we propose Tortoise and Hare Guidance (THG), a training-free strategy that accelerates diffusion sampling while maintaining high-fidelity generation. We demonstrate that the noise estimate and the additional guidance term exhibit markedly different sensitivity to numerical error by reformulating the classifier-free guidance (CFG) ODE as a multirate system of ODEs. Our error-bound analysis shows that the additional guidance branch is more robust to approximation, revealing substantial redundancy that conventional solvers fail to exploit. Building on this insight, THG significantly reduces the computation of the additional guidance: the noise estimate is integrated with the tortoise equation on the original, fine-grained timestep grid, while the additional guidance is integrated with the hare equation only on a coarse grid. We also introduce (i) an error-bound-aware timestep sampler that adaptively selects step sizes and (ii) a guidance-scale scheduler that stabilizes large extrapolation spans. THG reduces the number of function evaluations (NFE) by up to 30% with virtually no loss in generation fidelity (ImageReward 0.032) and outperforms state-of-the-art CFG-based training-free accelerators under identical computation budgets. Our findings highlight the potential of multirate formulations for diffusion solvers, paving the way for real-time high-quality image synthesis without any model retraining. The source code is available at https://github.com/yhlee-add/THG.

Paper Structure

This paper contains 42 sections, 1 theorem, 26 equations, 8 figures, 8 tables, 3 algorithms.

Key Result

Theorem 1

Assume the local integration error of an ODE using a solver of order $p$ and timestep size $\Delta t$ is given by: for sufficiently small $\Delta t$. Then the error of using the same solver repeatedly for $m$ steps is given by

Figures (8)

  • Figure 1: Conceptual illustration of Tortoise and Hare Guidance. We decompose the standard diffusion ODE into a tortoise branch (Eq. \ref{['eq:tort_bh']}), which is numerically sensitive and thus integrated on a fine-grained grid, and a hare branch (Eq. \ref{['eq:hare_bh']}), which is comparatively less sensitive and can be integrated with larger step sizes. Our multirate scheme evaluates each branch at different timestep grids, skipping unnecessary evaluations, thereby boosting inference efficiency without sacrificing sample quality.
  • Figure 2: Time-derivative norms of the noise estimate ${\hat{\epsilon}_c} (x_t)$ and additional guidance ${\Delta{\hat{\epsilon}_c}}(x_t)$. We plot the L2 norms of the time derivatives $\frac{\mathrm{d}}{\mathrm{d} t}{\hat{\epsilon}_c}(x_t)$ and $\frac{\mathrm{d}}{\mathrm{d} t}{\Delta{\hat{\epsilon}_c}}(x_t)$ across diffusion timesteps for Stable Diffusion 1.5 and 3.5 Large. The results confirm that the noise estimate exhibits greater temporal sensitivity compared to the guidance term. Shaded areas denote two standard deviations over multiple prompts.
  • Figure 3: Approximation error bounds of the tortoise $\textcolor{MediumBlue}{x^\mathsf{T}_{t}}$ and the hare $\textcolor{Crimson}{x^\mathsf{H}_{t}}$. We show the per‐timestep error bound of the tortoise and the hare terms across sampling steps. The consistently higher bounds for the tortoise curve indicate that the noise estimate is more sensitive to timestep resolution than the additional guidance. Shaded areas denote two standard deviations over multiple prompts.
  • Figure 4: Comparison of visual results for prompts from the COCO 2014 dataset.
  • Figure 5: Generated images using $\omega=2.5$ for the prompts "A group of zebras grazing in the grass.", "A yellow commuter train traveling past some houses.", "A couple of men standing on a field playing baseball.", and "Zoo scene of children at zoo near giraffes, attempting to pet or feed them." from the COCO 2014 dataset.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof