Table of Contents
Fetching ...

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

Anirud Aggarwal, Abhinav Shrivastava, Matthew Gwilliam

TL;DR

This work proposes Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts, and establishes ECAD as a scalable and generalizable approach for accelerating diffusion inference.

Abstract

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures. We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models. Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-alpha, PixArt-Sigma, and FLUX$.$1-dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-alpha, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x. Our results establish ECAD as a scalable and generalizable approach for accelerating diffusion inference. Our project page and code are available here: https://research.aniaggarwal.com/ecad

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model

TL;DR

This work proposes Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts, and establishes ECAD as a scalable and generalizable approach for accelerating diffusion inference.

Abstract

Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures. We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models. Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-alpha, PixArt-Sigma, and FLUX1-dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-alpha, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x. Our results establish ECAD as a scalable and generalizable approach for accelerating diffusion inference. Our project page and code are available here: https://research.aniaggarwal.com/ecad

Paper Structure

This paper contains 15 sections, 2 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: We conceptualize diffusion caching as a Pareto optimization problem over image quality and inference time and propose ECAD to discover such Pareto frontiers using a genetic algorithm. Left: performance progression over generations for FLUX.1-dev. Right: example $1024{\times}1024$ results with corresponding speedups.
  • Figure 2: In the context of a transformer-based diffusion model, we describe how the transformer architecture allows for caching of attention and feedforward results separately (left). We then give a toy illustration of how our method might transition from one generation to the next, prioritizing mating for schedules with the best quality-speed trade-offs (right).
  • Figure 3: PartiPrompt Pareto frontiers at $256\times256$ for PixArt-$\alpha$ (left) and FLUX.1-dev (right).
  • Figure 4: Qualitative results comparing our "fast" schedule for PixArt-$\alpha$$256{\times}256$ with ToCa; see Figure \ref{['fig:qualitative_flux_256_supp']} for FLUX.1-dev. "..." represent omitted text, see Appendix \ref{['sec:full_prompts_alpha_256']} for full prompts.
  • Figure 5: Our "fast" schedule for PixArt-$\alpha$ (left) and FLUX.1-dev (right). Reds are cached components and grays are recomputed (for PixArt-$\alpha$, from left to right: self-attention, cross-attention, and feedforward). See Appendix \ref{['sec:visualizing_ecad_schedules']} for more details.
  • ...and 2 more figures