Modulated Diffusion: Accelerating Generative Modeling with Modulated Quantization
Weizhi Gao, Zhichao Hou, Junqi Yin, Feiyi Wang, Linyu Peng, Xiaorui Liu
TL;DR
Diffusion models incur high computational cost due to iterative sampling. The authors propose MoDiff, a training-free framework that uses modulated quantization and error compensation to exploit temporal redundancy in diffusion steps, enabling aggressive activation quantization (down to 3 bits) without sacrificing fidelity. Theoretical results bound quantization error and demonstrate exponential suppression of accumulation with error compensation, while extensive experiments on CIFAR-10, LSUN, and larger models show substantial compute savings (often >10x GBops) and preserved or improved generation quality across multiple samplers. MoDiff is solver-agnostic and quantization-method-agnostic, making it a versatile augmentation to PTQ and caching approaches with broad practical impact for efficient diffusion-based generation.
Abstract
Diffusion models have emerged as powerful generative models, but their high computation cost in iterative sampling remains a significant bottleneck. In this work, we present an in-depth and insightful study of state-of-the-art acceleration techniques for diffusion models, including caching and quantization, revealing their limitations in computation error and generation quality. To break these limits, this work introduces Modulated Diffusion (MoDiff), an innovative, rigorous, and principled framework that accelerates generative modeling through modulated quantization and error compensation. MoDiff not only inherents the advantages of existing caching and quantization methods but also serves as a general framework to accelerate all diffusion models. The advantages of MoDiff are supported by solid theoretical insight and analysis. In addition, extensive experiments on CIFAR-10 and LSUN demonstrate that MoDiff significant reduces activation quantization from 8 bits to 3 bits without performance degradation in post-training quantization (PTQ). Our code implementation is available at https://github.com/WeizhiGao/MoDiff.
