Table of Contents
Fetching ...

SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models

Arani Roy, Shristi Das Biswas, Kaushik Roy

TL;DR

SlimDiff tackles the high compute cost of diffusion models by delivering a training-free, activation-aware compression pipeline. It introduces a spectral, activation-guided framework that aligns low-rank decompositions with dynamic activation statistics across denoising timesteps, guided by the Spectral Influence Score and a compact SlimSet calibration set. The method employs module-aligned MADAC decompositions (Nyström for FFN; whitening-SVD for QK/VO) and a propagation-aware Automatic Rank Allocation to meet a fixed parameter budget, with theoretical guarantees. Empirically, SlimDiff achieves up to 35% faster inference and around 100M fewer parameters while maintaining generation quality and human preference alignment, using only ~500 calibration prompts.

Abstract

Diffusion models (DMs), lauded for their generative performance, are computationally prohibitive due to their billion-scale parameters and iterative denoising dynamics. Existing efficiency techniques, such as quantization, timestep reduction, or pruning, offer savings in compute, memory, or runtime but are strictly bottlenecked by reliance on fine-tuning or retraining to recover performance. In this work, we introduce SlimDiff, an automated activation-informed structural compression framework that reduces both attention and feedforward dimensionalities in DMs, while being entirely gradient-free. SlimDiff reframes DM compression as a spectral approximation task, where activation covariances across denoising timesteps define low-rank subspaces that guide dynamic pruning under a fixed compression budget. This activation-aware formulation mitigates error accumulation across timesteps by applying module-wise decompositions over functional weight groups: query--key interactions, value--output couplings, and feedforward projections, rather than isolated matrix factorizations, while adaptively allocating sparsity across modules to respect the non-uniform geometry of diffusion trajectories. SlimDiff achieves up to 35\% acceleration and $\sim$100M parameter reduction over baselines, with generation quality on par with uncompressed models without any backpropagation. Crucially, our approach requires only about 500 calibration samples, over 70$\times$ fewer than prior methods. To our knowledge, this is the first closed-form, activation-guided structural compression of DMs that is entirely training-free, providing both theoretical clarity and practical efficiency.

SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models

TL;DR

SlimDiff tackles the high compute cost of diffusion models by delivering a training-free, activation-aware compression pipeline. It introduces a spectral, activation-guided framework that aligns low-rank decompositions with dynamic activation statistics across denoising timesteps, guided by the Spectral Influence Score and a compact SlimSet calibration set. The method employs module-aligned MADAC decompositions (Nyström for FFN; whitening-SVD for QK/VO) and a propagation-aware Automatic Rank Allocation to meet a fixed parameter budget, with theoretical guarantees. Empirically, SlimDiff achieves up to 35% faster inference and around 100M fewer parameters while maintaining generation quality and human preference alignment, using only ~500 calibration prompts.

Abstract

Diffusion models (DMs), lauded for their generative performance, are computationally prohibitive due to their billion-scale parameters and iterative denoising dynamics. Existing efficiency techniques, such as quantization, timestep reduction, or pruning, offer savings in compute, memory, or runtime but are strictly bottlenecked by reliance on fine-tuning or retraining to recover performance. In this work, we introduce SlimDiff, an automated activation-informed structural compression framework that reduces both attention and feedforward dimensionalities in DMs, while being entirely gradient-free. SlimDiff reframes DM compression as a spectral approximation task, where activation covariances across denoising timesteps define low-rank subspaces that guide dynamic pruning under a fixed compression budget. This activation-aware formulation mitigates error accumulation across timesteps by applying module-wise decompositions over functional weight groups: query--key interactions, value--output couplings, and feedforward projections, rather than isolated matrix factorizations, while adaptively allocating sparsity across modules to respect the non-uniform geometry of diffusion trajectories. SlimDiff achieves up to 35\% acceleration and 100M parameter reduction over baselines, with generation quality on par with uncompressed models without any backpropagation. Crucially, our approach requires only about 500 calibration samples, over 70 fewer than prior methods. To our knowledge, this is the first closed-form, activation-guided structural compression of DMs that is entirely training-free, providing both theoretical clarity and practical efficiency.

Paper Structure

This paper contains 43 sections, 37 equations, 10 figures, 14 tables, 3 algorithms.

Figures (10)

  • Figure 1: SlimDiff compresses diffusion models by sampling a semantic calibration set (SlimSet$\mathcal{S}$), Spectral Influence Scoring each module’s alignment with input anisotropy, which drives Timestep-Aware Correlation Modeling and an Automatic Rank Allocator under a global budget. Finally, MADAC applies whitening–SVD to $\mathcal{QK}$/$\mathcal{VO}$ and Nyström reduction to $\mathcal{FFN}$.
  • Figure 2: SlimSet coverage. Distribution of distinctiveness scores for LAION-212K (grey) and SlimSet with $J{=}500$ (blue). Quantile-based binning with proportional allocation ensures SlimSet spans the entire corpus range.
  • Figure 3: Visual comparison with contemporaries shows that SlimDiff maintains higher perceptual quality post-compression. Methods that rely on BP for model slimming are grayed out.
  • Figure 4: Spectral Influence Score Distribution across different functional modules
  • Figure 5: Diversity distribution of input activation across different functional modules
  • ...and 5 more figures