Table of Contents
Fetching ...

Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models

Changlin Li, Jiawei Zhang, Zeyi Shi, Zongxin Yang, Zhihui Li, Xiaojun Chang

TL;DR

This work tackles parameter redundancy in large diffusion and flow models by introducing EntPruner, an entropy-guided progressive pruning framework. It uses Conditional Entropy Deviation (CED) to assess block-level distributional impact and employs zero-shot NAS proxies (NTK condition number and ZiCo) to schedule pruning during training, enabling stable, progressive compression. Empirical results across DiT and SiT backbones on ImageNet and multiple downstream datasets show up to 2.22× inference speedup with maintenance of generation quality, surpassing prior pruning baselines. The approach enables efficient deployment of diffusion transformers in resource-constrained settings, with broad implications for practical generative modeling.

Abstract

Large-scale vision generative models, including diffusion and flow models, have demonstrated remarkable performance in visual generation tasks. However, transferring these pre-trained models to downstream tasks often results in significant parameter redundancy. In this paper, we propose EntPruner, an entropy-guided automatic progressive pruning framework for diffusion and flow models. First, we introduce entropy-guided pruning, a block-level importance assessment strategy specifically designed for generative models. Unlike discriminative models, generative models require preserving the diversity and condition-fidelity of the output distribution. As the importance of each module can vary significantly across downstream tasks, EntPruner prioritizes pruning of less important blocks using data-dependent Conditional Entropy Deviation (CED) as a guiding metric. CED quantifies how much the distribution diverges from the learned conditional data distribution after removing a block. Second, we propose a zero-shot adaptive pruning framework to automatically determine when and how much to prune during training. This dynamic strategy avoids the pitfalls of one-shot pruning, mitigating mode collapse, and preserving model performance. Extensive experiments on DiT and SiT models demonstrate the effectiveness of EntPruner, achieving up to 2.22$\times$ inference speedup while maintaining competitive generation quality on ImageNet and three downstream datasets.

Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models

TL;DR

This work tackles parameter redundancy in large diffusion and flow models by introducing EntPruner, an entropy-guided progressive pruning framework. It uses Conditional Entropy Deviation (CED) to assess block-level distributional impact and employs zero-shot NAS proxies (NTK condition number and ZiCo) to schedule pruning during training, enabling stable, progressive compression. Empirical results across DiT and SiT backbones on ImageNet and multiple downstream datasets show up to 2.22× inference speedup with maintenance of generation quality, surpassing prior pruning baselines. The approach enables efficient deployment of diffusion transformers in resource-constrained settings, with broad implications for practical generative modeling.

Abstract

Large-scale vision generative models, including diffusion and flow models, have demonstrated remarkable performance in visual generation tasks. However, transferring these pre-trained models to downstream tasks often results in significant parameter redundancy. In this paper, we propose EntPruner, an entropy-guided automatic progressive pruning framework for diffusion and flow models. First, we introduce entropy-guided pruning, a block-level importance assessment strategy specifically designed for generative models. Unlike discriminative models, generative models require preserving the diversity and condition-fidelity of the output distribution. As the importance of each module can vary significantly across downstream tasks, EntPruner prioritizes pruning of less important blocks using data-dependent Conditional Entropy Deviation (CED) as a guiding metric. CED quantifies how much the distribution diverges from the learned conditional data distribution after removing a block. Second, we propose a zero-shot adaptive pruning framework to automatically determine when and how much to prune during training. This dynamic strategy avoids the pitfalls of one-shot pruning, mitigating mode collapse, and preserving model performance. Extensive experiments on DiT and SiT models demonstrate the effectiveness of EntPruner, achieving up to 2.22 inference speedup while maintaining competitive generation quality on ImageNet and three downstream datasets.

Paper Structure

This paper contains 19 sections, 14 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: (a) Generated results of a series of flow models pruned by EntPruner. Entpruner achieves up to 2.22× speedup while maintaining generation quality. Base model: SiT-XL/2 with ODE solver, CFG=4.0, 250 sampling steps. (b) Effect of removing different number of blocks in SiT. Strong correlation between Conditional Entropy Deviation (CED) and loss confirms CED's effectiveness in quantifying block importance in flow models.
  • Figure 2: Signed CED for each block in SiT-XL/2 and DiT-XL/2. The sign of CED reveals the type of distributional degradation: positive values indicate drift toward randomness/noise (increased entropy), while negative values indicate mode collapse or oversimplified solutions (decreased entropy). Critical blocks exhibit large absolute CED.
  • Figure 3: Entropy-guided automatic pruning framework. At first, we employ CED to evaluate and rank the expressiveness of individual blocks, where darker regions in the heatmap indicate stronger interactions with the overall network. At each pruning stage, the pruning schedule explores candidate subnetworks under different pruning ratios, selects the optimal one using zero-shot proxies, and inherits parameters from the previous stage.
  • Figure 4: Qualitative comparison of flow models pruned by different methods. Base model is SiT-XL/2, with 35% pruning rate. Datasets are Flowers (column 1-2), CUB (column 3-4), and ArtBench (column 5-6). Our EntPruner consistently generates finer details.
  • Figure 5: Tradeoff between computational cost (MACs) and generative quality (FID). Rightmost point: full model.
  • ...and 4 more figures