OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Junhan Zhu; Hesong Wang; Mingluo Su; Zefang Wang; Huan Wang

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, Huan Wang

TL;DR

OBS-Diff tackles the high computational cost of large diffusion models by enabling training-free, one-shot pruning. It revitalizes the Optimal Brain Surgeon with a timestep-aware Hessian to account for error accumulation during diffusion and introduces a group-wise Module Package strategy to amortize calibration costs. The framework supports unstructured, semi-structured, and structured pruning (including MHA heads and FFN neurons) and demonstrates state-of-the-art pruning performance across multiple diffusion models, achieving notable speedups with minimal quality loss. This approach offers a practical, hardware-friendly path to deploy large diffusion models more efficiently without retraining.

Abstract

Large-scale text-to-image diffusion models, while powerful, suffer from prohibitive computational cost. Existing one-shot network pruning methods can hardly be directly applied to them due to the iterative denoising nature of diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel one-shot pruning framework that enables accurate and training-free compression of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex architectures of modern diffusion models and supporting diverse pruning granularity, including unstructured, N:M semi-structured, and structured (MHA heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the iterative dynamics of the diffusion process, by examining the problem from an error-accumulation perspective, we propose a novel timestep-aware Hessian construction that incorporates a logarithmic-decrease weighting scheme, assigning greater importance to earlier timesteps to mitigate potential error accumulation; (iii) Furthermore, a computationally efficient group-wise sequential pruning strategy is proposed to amortize the expensive calibration process. Extensive experiments show that OBS-Diff achieves state-of-the-art one-shot pruning for diffusion models, delivering inference acceleration with minimal degradation in visual quality.

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

TL;DR

Abstract

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)