Table of Contents
Fetching ...

SODA: Sensitivity-Oriented Dynamic Acceleration for Diffusion Transformer

Tong Shao, Yusen Fu, Guoying Sun, Jingde Kong, Zhuotao Tian, Jingyong Su

TL;DR

SODA is proposed, a Sensitivity-Oriented Dynamic Acceleration method that adaptively performs caching and pruning based on fine-grained sensitivity that achieves state-of-the-art generation fidelity under controllable acceleration ratios.

Abstract

Diffusion Transformers have become a dominant paradigm in visual generation, yet their low inference efficiency remains a key bottleneck hindering further advancement. Among common training-free techniques, caching offers high acceleration efficiency but often compromises fidelity, whereas pruning shows the opposite trade-off. Integrating caching with pruning achieves a balance between acceleration and generation quality. However, existing methods typically employ fixed and heuristic schemes to configure caching and pruning strategies. While they roughly follow the overall sensitivity trend of generation models to acceleration, they fail to capture fine-grained and complex variations, inevitably skipping highly sensitive computations and leading to quality degradation. Furthermore, such manually designed strategies exhibit poor generalization. To address these issues, we propose SODA, a Sensitivity-Oriented Dynamic Acceleration method that adaptively performs caching and pruning based on fine-grained sensitivity. SODA builds an offline sensitivity error modeling framework across timesteps, layers, and modules to capture the sensitivity to different acceleration operations. The cache intervals are optimized via dynamic programming with sensitivity error as the cost function, minimizing the impact of caching on model sensitivity. During pruning and cache reuse, SODA adaptively determines the pruning timing and rate to preserve computations of highly sensitive tokens, significantly enhancing generation fidelity. Extensive experiments on DiT-XL/2, PixArt-$α$, and OpenSora demonstrate that SODA achieves state-of-the-art generation fidelity under controllable acceleration ratios. Our code is released publicly at: https://github.com/leaves162/SODA.

SODA: Sensitivity-Oriented Dynamic Acceleration for Diffusion Transformer

TL;DR

SODA is proposed, a Sensitivity-Oriented Dynamic Acceleration method that adaptively performs caching and pruning based on fine-grained sensitivity that achieves state-of-the-art generation fidelity under controllable acceleration ratios.

Abstract

Diffusion Transformers have become a dominant paradigm in visual generation, yet their low inference efficiency remains a key bottleneck hindering further advancement. Among common training-free techniques, caching offers high acceleration efficiency but often compromises fidelity, whereas pruning shows the opposite trade-off. Integrating caching with pruning achieves a balance between acceleration and generation quality. However, existing methods typically employ fixed and heuristic schemes to configure caching and pruning strategies. While they roughly follow the overall sensitivity trend of generation models to acceleration, they fail to capture fine-grained and complex variations, inevitably skipping highly sensitive computations and leading to quality degradation. Furthermore, such manually designed strategies exhibit poor generalization. To address these issues, we propose SODA, a Sensitivity-Oriented Dynamic Acceleration method that adaptively performs caching and pruning based on fine-grained sensitivity. SODA builds an offline sensitivity error modeling framework across timesteps, layers, and modules to capture the sensitivity to different acceleration operations. The cache intervals are optimized via dynamic programming with sensitivity error as the cost function, minimizing the impact of caching on model sensitivity. During pruning and cache reuse, SODA adaptively determines the pruning timing and rate to preserve computations of highly sensitive tokens, significantly enhancing generation fidelity. Extensive experiments on DiT-XL/2, PixArt-, and OpenSora demonstrate that SODA achieves state-of-the-art generation fidelity under controllable acceleration ratios. Our code is released publicly at: https://github.com/leaves162/SODA.
Paper Structure (52 sections, 12 equations, 14 figures, 13 tables, 2 algorithms)

This paper contains 52 sections, 12 equations, 14 figures, 13 tables, 2 algorithms.

Figures (14)

  • Figure 1: Our SODA enables sensitivity awareness and adaptive acceleration strategy decision.(a). Examples of model-internal sensitivity to acceleration. The sensitivity of different timesteps, layers, and modules to acceleration is highly complex and dynamic. (b). Correlation between acceleration strategies and model sensitivity. Heuristic methods such as DuCa zou2024accelerating1 fail to be consistent with the complex sensitivity. (c). Acceleration versus generation quality. Our SODA significantly outperforms the baselines by alleviating quality degradation under acceleration.
  • Figure 2: Model architecture of SODA. We propose SODA, a Sensitivity-Oriented Dynamic Acceleration method. (1). Offline Fine-grained Sensitivity Modeling (OFS): Defining error to measure the fine-grained sensitivity of timesteps, layers and modules before inference. (2). Dynamic Caching Scheduling Optimization (DCS): Employing dynamic programming to identify the optimal combination of cache intervals that yields the minimal sensitivity impact. (3). Unified Adaptive Strategy Formulation (UAS): Achieving adaptive scheduling for pruning timing and rate guided by sensitivity errors when pruning.
  • Figure 3: Comparison of cumulative sensitivity error and generation fidelity. Compared with ToCa and DuCa, our DCS module in SODA effectively reduces sensitivity error under the same acceleration efficiency. When combined with the UAS module, generation fidelity is further improved.
  • Figure 4: Qualitative comparison of acceleration.(a) DiT-XL/2 (DDIM). (b) PixArt-$\alpha$. (c) OpenSora. We highlight the areas with red dashed boxes to emphasize the comparison. Our SODA achieves higher generation fidelity under the same or higher acceleration efficiency compared with baselines. More visualizations are provided in Appendix C.
  • Figure 5: VBench detailed metric comparison. We normalize metrics by setting the highest value to 100%, and visualize 9 selected metrics along with acceleration efficiency.
  • ...and 9 more figures