TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Jintao Zhang; Kaiwen Zheng; Kai Jiang; Haoxu Wang; Ion Stoica; Joseph E. Gonzalez; Jianfei Chen; Jun Zhu

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Jintao Zhang, Kaiwen Zheng, Kai Jiang, Haoxu Wang, Ion Stoica, Joseph E. Gonzalez, Jianfei Chen, Jun Zhu

TL;DR

TurboDiffusion targets the slow bottleneck of diffusion-based video generation by integrating four acceleration primitives. It combines attention acceleration (low-bit SageAttention and Sparse-Linear Attention), step distillation via rCM, and W8A8 quantization to compress and speed up inference, with additional engineering optimizations. On Wan2.2-I2V-A14B-720P and Wan2.1-T2V models, it delivers 100–200× end-to-end speedups on a single RTX 5090 while preserving video quality, and includes a ready-to-use GitHub repository. The approach demonstrates practical, high-speed video generation that narrows the gap between diffusion-based methods and real-time needs. Future work aims to extend to autoregressive video diffusion and other paradigms.

Abstract

We introduce TurboDiffusion, a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality. TurboDiffusion mainly relies on several components for acceleration: (1) Attention acceleration: TurboDiffusion uses low-bit SageAttention and trainable Sparse-Linear Attention (SLA) to speed up attention computation. (2) Step distillation: TurboDiffusion adopts rCM for efficient step distillation. (3) W8A8 quantization: TurboDiffusion quantizes model parameters and activations to 8 bits to accelerate linear layers and compress the model. In addition, TurboDiffusion incorporates several other engineering optimizations. We conduct experiments on the Wan2.2-I2V-14B-720P, Wan2.1-T2V-1.3B-480P, Wan2.1-T2V-14B-720P, and Wan2.1-T2V-14B-480P models. Experimental results show that TurboDiffusion achieves 100-200x speedup for video generation even on a single RTX 5090 GPU, while maintaining comparable video quality. The GitHub repository, which includes model checkpoints and easy-to-use code, is available at https://github.com/thu-ml/TurboDiffusion.

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

TL;DR

Abstract

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (29)