Table of Contents
Fetching ...

Isomorphic Pruning for Vision Models

Gongfan Fang, Xinyin Ma, Michael Bi Mi, Xinchao Wang

TL;DR

Isomorphic Pruning addresses the incompatibility of comparing heterogeneous sub-structures in modern vision models by decomposing networks into isomorphic sub-structures based on topology and pruning within each group. The method models sub-structures as graphs, uses graph isomorphism to cluster similar motifs, and applies a chosen importance criterion (e.g., Magnitude or Taylor) to rank and prune within groups, enabling reliable, architecture-agnostic pruning. Empirically, it yields competitive or superior accuracy with reduced MACs and parameters across ConvNext, ResNet, MobileNet-v2, and Vision Transformers on ImageNet-1K, while providing actionable latency and memory benefits on real hardware. The approach demonstrates the practicality of topology-aware pruning for heterogeneous vision models and offers a flexible framework compatible with multiple pruning criteria and architectures, accompanied by open-source code.

Abstract

Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. However, assessing the relative importance of different sub-structures remains a significant challenge, particularly in advanced vision models featuring novel mechanisms and architectures like self-attention, depth-wise convolutions, or residual connections. These heterogeneous substructures usually exhibit diverged parameter scales, weight distributions, and computational topology, introducing considerable difficulty to importance comparison. To overcome this, we present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures such as Vision Transformers and CNNs, and delivers competitive performance across different model sizes. Isomorphic Pruning originates from an observation that, when evaluated under a pre-defined importance criterion, heterogeneous sub-structures demonstrate significant divergence in their importance distribution, as opposed to isomorphic structures that present similar importance patterns. This inspires us to perform isolated ranking and comparison on different types of sub-structures for more reliable pruning. Our empirical results on ImageNet-1K demonstrate that Isomorphic Pruning surpasses several pruning baselines dedicatedly designed for Transformers or CNNs. For instance, we improve the accuracy of DeiT-Tiny from 74.52% to 77.50% by pruning an off-the-shelf DeiT-Base model. And for ConvNext-Tiny, we enhanced performance from 82.06% to 82.18%, while reducing the number of parameters and memory usage. Code is available at \url{https://github.com/VainF/Isomorphic-Pruning}.

Isomorphic Pruning for Vision Models

TL;DR

Isomorphic Pruning addresses the incompatibility of comparing heterogeneous sub-structures in modern vision models by decomposing networks into isomorphic sub-structures based on topology and pruning within each group. The method models sub-structures as graphs, uses graph isomorphism to cluster similar motifs, and applies a chosen importance criterion (e.g., Magnitude or Taylor) to rank and prune within groups, enabling reliable, architecture-agnostic pruning. Empirically, it yields competitive or superior accuracy with reduced MACs and parameters across ConvNext, ResNet, MobileNet-v2, and Vision Transformers on ImageNet-1K, while providing actionable latency and memory benefits on real hardware. The approach demonstrates the practicality of topology-aware pruning for heterogeneous vision models and offers a flexible framework compatible with multiple pruning criteria and architectures, accompanied by open-source code.

Abstract

Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. However, assessing the relative importance of different sub-structures remains a significant challenge, particularly in advanced vision models featuring novel mechanisms and architectures like self-attention, depth-wise convolutions, or residual connections. These heterogeneous substructures usually exhibit diverged parameter scales, weight distributions, and computational topology, introducing considerable difficulty to importance comparison. To overcome this, we present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures such as Vision Transformers and CNNs, and delivers competitive performance across different model sizes. Isomorphic Pruning originates from an observation that, when evaluated under a pre-defined importance criterion, heterogeneous sub-structures demonstrate significant divergence in their importance distribution, as opposed to isomorphic structures that present similar importance patterns. This inspires us to perform isolated ranking and comparison on different types of sub-structures for more reliable pruning. Our empirical results on ImageNet-1K demonstrate that Isomorphic Pruning surpasses several pruning baselines dedicatedly designed for Transformers or CNNs. For instance, we improve the accuracy of DeiT-Tiny from 74.52% to 77.50% by pruning an off-the-shelf DeiT-Base model. And for ConvNext-Tiny, we enhanced performance from 82.06% to 82.18%, while reducing the number of parameters and memory usage. Code is available at \url{https://github.com/VainF/Isomorphic-Pruning}.
Paper Structure (37 sections, 5 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 37 sections, 5 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: ImageNet Top-1 Accuracy vs. Multiply–Accumulate Operations (MACs) of pruned DeiT touvron2021deit and pruned ConvNext liu2022convnet. The pruned models, marked as "$\bigstar$" have comparable or better latency yet superior performance compared to scratch training counterparts highlighted by "$\pentagon$". The MACs are in log scale for better visualization
  • Figure 2: For a pre-trained network (a), we show three pruning strategies: (b) Local Pruning that compares parameter importance within current layers; (c) Global Pruning that performs global ranking for all parameters; (d) the proposed Isomorphic pruning that groups parameters by the computational topology and applies importance ranking within groups. In each group, the importance distributions are more comparable. Details about the distribution can be found in Fig. \ref{['fig:vis_isomorphic']} of the experiments section.
  • Figure 3: Isomorphic Pruning models sub-structures as graphs, and apply isolated ranking and pruning with isomorphic structures. We highlight three removable substructures in an MLP and show their corresponding graphs below. Substructures 1 & 3 are isomorphic but 2 & 3 are non-isomorphic due to the additional residual connections.
  • Figure 4: (a-b) Top-1 Accuracy and loss of pruning ResNet-50 on ImageNet-1K validation set without finetuning. We report the mean and standard derivation of 10 experiments. IsomorphicP Pruning consistently improves the performance of pruned models. (c) Histogram of Taylor importance score of DeiT sub-structures. Isomorphic structures are highlighted with the same color. Zoom in for more details about each structure. We visualize the threshold of 50% pruning ratio for naive global pruning (the black dash) and isomorphic pruning (the colored dash).
  • Figure 5: The isomorphic groups in a vision transformer block. There are three groups for width pruning, which reduces the dimensions of embedding, MLP and attention. One special group works in the head level, which removes entire heads for acceleration. The shapes of intermediate features are hightlighted.
  • ...and 1 more figures

Theorems & Definitions (1)

  • definition thmcounterdefinition: Graph Isomorphism