Table of Contents
Fetching ...

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, Huan Wang

TL;DR

OBS-Diff tackles the high computational cost of large diffusion models by enabling training-free, one-shot pruning. It revitalizes the Optimal Brain Surgeon with a timestep-aware Hessian to account for error accumulation during diffusion and introduces a group-wise Module Package strategy to amortize calibration costs. The framework supports unstructured, semi-structured, and structured pruning (including MHA heads and FFN neurons) and demonstrates state-of-the-art pruning performance across multiple diffusion models, achieving notable speedups with minimal quality loss. This approach offers a practical, hardware-friendly path to deploy large diffusion models more efficiently without retraining.

Abstract

Large-scale text-to-image diffusion models, while powerful, suffer from prohibitive computational cost. Existing one-shot network pruning methods can hardly be directly applied to them due to the iterative denoising nature of diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel one-shot pruning framework that enables accurate and training-free compression of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex architectures of modern diffusion models and supporting diverse pruning granularity, including unstructured, N:M semi-structured, and structured (MHA heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the iterative dynamics of the diffusion process, by examining the problem from an error-accumulation perspective, we propose a novel timestep-aware Hessian construction that incorporates a logarithmic-decrease weighting scheme, assigning greater importance to earlier timesteps to mitigate potential error accumulation; (iii) Furthermore, a computationally efficient group-wise sequential pruning strategy is proposed to amortize the expensive calibration process. Extensive experiments show that OBS-Diff achieves state-of-the-art one-shot pruning for diffusion models, delivering inference acceleration with minimal degradation in visual quality.

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

TL;DR

OBS-Diff tackles the high computational cost of large diffusion models by enabling training-free, one-shot pruning. It revitalizes the Optimal Brain Surgeon with a timestep-aware Hessian to account for error accumulation during diffusion and introduces a group-wise Module Package strategy to amortize calibration costs. The framework supports unstructured, semi-structured, and structured pruning (including MHA heads and FFN neurons) and demonstrates state-of-the-art pruning performance across multiple diffusion models, achieving notable speedups with minimal quality loss. This approach offers a practical, hardware-friendly path to deploy large diffusion models more efficiently without retraining.

Abstract

Large-scale text-to-image diffusion models, while powerful, suffer from prohibitive computational cost. Existing one-shot network pruning methods can hardly be directly applied to them due to the iterative denoising nature of diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel one-shot pruning framework that enables accurate and training-free compression of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex architectures of modern diffusion models and supporting diverse pruning granularity, including unstructured, N:M semi-structured, and structured (MHA heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the iterative dynamics of the diffusion process, by examining the problem from an error-accumulation perspective, we propose a novel timestep-aware Hessian construction that incorporates a logarithmic-decrease weighting scheme, assigning greater importance to earlier timesteps to mitigate potential error accumulation; (iii) Furthermore, a computationally efficient group-wise sequential pruning strategy is proposed to amortize the expensive calibration process. Extensive experiments show that OBS-Diff achieves state-of-the-art one-shot pruning for diffusion models, delivering inference acceleration with minimal degradation in visual quality.

Paper Structure

This paper contains 36 sections, 8 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Qualitative comparison of unstructured pruning methods on the SD3-Medium model esser2024scaling. We evaluate Magnitude, DSnoT zhangdynamic, Wanda sunsimple, and our method (OBS-Diff) at various sparsity levels (20%, 30%, 40%, and 50%) using the same prompt and negative prompt. All images are generated at a resolution of $512 \times 512$.
  • Figure 2: Illustration of the proposed OBS-Diff framework applied to the MMDiT architecture. Target modules are first partitioned into a predefined number of "Module Packages" and processed sequentially. For each package, hooks capture layer activations during a forward pass with a calibration dataset. This data, combined with weights from a dedicated timestep weighting scheme, is used to construct Hessian matrices. These matrices guide the Optimal Brain Surgeon (OBS) algorithm to simultaneously prune all layers within the current package before proceeding to the next.
  • Figure 3: Effect of the number of prompts in calibration dataset on the ImageReward.
  • Figure 4: Pruning time of different unstructured pruning methods on SD3-Medium (2B) at 50% sparsity.
  • Figure 5: ImageReward vs. sparsity for various unstructured pruning methods on SD3-Medium.
  • ...and 9 more figures