Table of Contents
Fetching ...

Structurally Prune Anything: Any Architecture, Any Framework, Any Time

Xun Wang, John Rachwan, Stephan Günnemann, Bertrand Charpentier

TL;DR

This work introduces Structurally Prune Anything (SPA), a framework that unifies structured pruning across any architecture, any framework, and any pruning stage by leveraging a standardized ONNX computational graph. SPA automates coupling-channel detection and group-level importance estimation to transfer existing pruning criteria into a grouped, structured form, enabling prune-train, train-prune-finetune, and prune-train workflows. A key addition, OBSPA, enables pruning without fine-tuning and even without calibration data, achieving state-of-the-art results in data-free settings on CIFAR and NLP benchmarks, while remaining competitive on ImageNet-scale tasks. Empirically, SPA demonstrates framework-agnostic and architecture-agnostic pruning with competitive accuracy and substantial reductions in FLOPs and parameters, while offering significantly faster pruning times than prior data-free methods.

Abstract

Neural network pruning serves as a critical technique for enhancing the efficiency of deep learning models. Unlike unstructured pruning, which only sets specific parameters to zero, structured pruning eliminates entire channels, thus yielding direct computational and storage benefits. However, the diverse patterns for coupling parameters, such as residual connections and group convolutions, the diverse deep learning frameworks, and the various time stages at which pruning can be performed make existing pruning methods less adaptable to different architectures, frameworks, and pruning criteria. To address this, we introduce Structurally Prune Anything (SPA), a versatile structured pruning framework that can prune neural networks with any architecture, from any framework, and at any stage of training. SPA leverages a standardized computational graph and ONNX representation to prune diverse neural network architectures without the need for manual intervention. SPA employs a group-level importance estimation method, which groups dependent computational operators, estimates their importance, and prunes unimportant coupled channels. This enables the transfer of various existing pruning criteria into a structured group style. As a result, SPA supports pruning at any time, either before training, after training with fine-tuning, or after training without fine-tuning. In the context of the latter, we introduce Optimal Brain SPA (OBSPA), an algorithm that achieves state-of-the-art pruning results needing neither fine-tuning nor calibration data. In extensive experiments, SPA shows competitive to state-of-the-art pruning performance across various architectures, from popular frameworks, at different pruning times.

Structurally Prune Anything: Any Architecture, Any Framework, Any Time

TL;DR

This work introduces Structurally Prune Anything (SPA), a framework that unifies structured pruning across any architecture, any framework, and any pruning stage by leveraging a standardized ONNX computational graph. SPA automates coupling-channel detection and group-level importance estimation to transfer existing pruning criteria into a grouped, structured form, enabling prune-train, train-prune-finetune, and prune-train workflows. A key addition, OBSPA, enables pruning without fine-tuning and even without calibration data, achieving state-of-the-art results in data-free settings on CIFAR and NLP benchmarks, while remaining competitive on ImageNet-scale tasks. Empirically, SPA demonstrates framework-agnostic and architecture-agnostic pruning with competitive accuracy and substantial reductions in FLOPs and parameters, while offering significantly faster pruning times than prior data-free methods.

Abstract

Neural network pruning serves as a critical technique for enhancing the efficiency of deep learning models. Unlike unstructured pruning, which only sets specific parameters to zero, structured pruning eliminates entire channels, thus yielding direct computational and storage benefits. However, the diverse patterns for coupling parameters, such as residual connections and group convolutions, the diverse deep learning frameworks, and the various time stages at which pruning can be performed make existing pruning methods less adaptable to different architectures, frameworks, and pruning criteria. To address this, we introduce Structurally Prune Anything (SPA), a versatile structured pruning framework that can prune neural networks with any architecture, from any framework, and at any stage of training. SPA leverages a standardized computational graph and ONNX representation to prune diverse neural network architectures without the need for manual intervention. SPA employs a group-level importance estimation method, which groups dependent computational operators, estimates their importance, and prunes unimportant coupled channels. This enables the transfer of various existing pruning criteria into a structured group style. As a result, SPA supports pruning at any time, either before training, after training with fine-tuning, or after training without fine-tuning. In the context of the latter, we introduce Optimal Brain SPA (OBSPA), an algorithm that achieves state-of-the-art pruning results needing neither fine-tuning nor calibration data. In extensive experiments, SPA shows competitive to state-of-the-art pruning performance across various architectures, from popular frameworks, at different pruning times.
Paper Structure (28 sections, 14 equations, 9 figures, 13 tables, 3 algorithms)

This paper contains 28 sections, 14 equations, 9 figures, 13 tables, 3 algorithms.

Figures (9)

  • Figure 1: SPA overview. The source model can be chosen freely from different frameworks with different structures, either trained or not. A computational graph is built to store the dependency information between operators and data. The pruning procedure consists of four steps: coupling channels, grouping channels & importance estimation, and pruning. After pruning, the pruned model can be converted to other frameworks for further usage.
  • Figure 2: Comparison of Computational Graph and Dependency Graph. \ref{['fig:CG']} is a computational graph. This graph is composed of three operators linked by the data nodes. Convolution and BatchNorm have parameters; they form the parameter nodes in the computational graph. \ref{['fig:DG']} is the Dependency Graph of the same structure; only information on linked operators is stored.
  • Figure 3: Trade off between accuracy and FLOPs/parameters with VGG-16 on CIFAR-100 (\ref{['fig:vgg16_L1_RF', 'fig:vgg16_L1_RP', 'fig:vgg16_SNIP_RF', 'fig:vgg16_SNIP_RP', 'fig:vgg16_CROP_RF', 'fig:vgg16_CROP_RP', 'fig:vgg16_GRASP_RF', 'fig:vgg16_GRASP_RP']}). SPA efficiently implements both the structured and grouped versions of train-prune-finetune criteria like L1 and prune-train criteria like SNAP, CroP and GraSP
  • Figure 4: Trade off between accuracy and FLOPs/parameters with DistilBERT on SST-2 sentiment classificaiton task.
  • Figure 5: Showcase of a group of a residual structure. Four convolutions with a residual skip form this residual structure. All colored blocks form a group. Within this group, each color represents a coupled channel that must be pruned altogether.
  • ...and 4 more figures