Table of Contents
Fetching ...

ONNXPruner: ONNX-Based General Model Pruning Adapter

Dongdong Ren, Wenbin Li, Tianyu Ding, Lei Wang, Qi Fan, Jing Huo, Hongbing Pan, Yang Gao

TL;DR

ONNXPruner introduces a general pruning adapter for ONNX models that abstracts pruning across frameworks by constructing node association trees for each pruned node and applying a tree-level evaluation to jointly assess pruned and associated channels. By standardizing models in ONNX and leveraging ONNX Runtime, it enables cross-framework deployment without extra architectural changes or retraining. The approach defines a four-type node attribute library and four pruning patterns (one-to-one, one-to-many, many-to-one, many-to-many) with explicit scoring formulas using channel-wise norms. Empirical results across CIFAR, ImageNet, and PASCAL VOC 2012 show that ONNXPruner can outperform several baselines, improve interoperability, and reduce integration overhead, highlighting its practical impact for scalable pruning workflows.

Abstract

Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process across diverse deep learning frameworks and hardware platforms. A novel aspect of ONNXPruner is its use of node association trees, which automatically adapt to various model architectures. These trees clarify the structural relationships between nodes, guiding the pruning process, particularly highlighting the impact on interconnected nodes. Furthermore, we introduce a tree-level evaluation method. By leveraging node association trees, this method allows for a comprehensive analysis beyond traditional single-node evaluations, enhancing pruning performance without the need for extra operations. Experiments across multiple models and datasets confirm ONNXPruner's strong adaptability and increased efficacy. Our work aims to advance the practical application of model pruning.

ONNXPruner: ONNX-Based General Model Pruning Adapter

TL;DR

ONNXPruner introduces a general pruning adapter for ONNX models that abstracts pruning across frameworks by constructing node association trees for each pruned node and applying a tree-level evaluation to jointly assess pruned and associated channels. By standardizing models in ONNX and leveraging ONNX Runtime, it enables cross-framework deployment without extra architectural changes or retraining. The approach defines a four-type node attribute library and four pruning patterns (one-to-one, one-to-many, many-to-one, many-to-many) with explicit scoring formulas using channel-wise norms. Empirical results across CIFAR, ImageNet, and PASCAL VOC 2012 show that ONNXPruner can outperform several baselines, improve interoperability, and reduce integration overhead, highlighting its practical impact for scalable pruning workflows.

Abstract

Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process across diverse deep learning frameworks and hardware platforms. A novel aspect of ONNXPruner is its use of node association trees, which automatically adapt to various model architectures. These trees clarify the structural relationships between nodes, guiding the pruning process, particularly highlighting the impact on interconnected nodes. Furthermore, we introduce a tree-level evaluation method. By leveraging node association trees, this method allows for a comprehensive analysis beyond traditional single-node evaluations, enhancing pruning performance without the need for extra operations. Experiments across multiple models and datasets confirm ONNXPruner's strong adaptability and increased efficacy. Our work aims to advance the practical application of model pruning.
Paper Structure (16 sections, 5 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 5 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of pruning algorithms in development and deployment. Existing pruning algorithms (red boxes) are tailored for specific development frameworks and necessitate manual adaptation for various model structures. Our work introduces ONNXPruner (blue box), a versatile model pruning adapter for ONNX format models, which provides automatic adaptation of pruning algorithms to models with diverse structures, in-depending on the development framework used.
  • Figure 2: Comparison between the existing pruning framework (A) and the proposed pruning framework (B). (a) The current method fang2023depgraph recursively traverses each node to identify the associated nodes. (b) Evaluates filters based solely on the pruned node for pruning decisions. (c) Prunes associated nodes based on the evaluation in (b). (d) The proposed ONNXPruner constructs node association trees to represent the relationships between a pruned node and its associated nodes, presenting these relationships hierarchically. (e) Employs a tree-level pruning strategy, leveraging the node association tree to jointly evaluate and prune the filters of both the pruned node and associated nodes. (f) Prunes associated nodes following the evaluation in (e). CO and CI represent the output and input channels of the Conv kernel, respectively.
  • Figure 3: An example for constructing the node association trees. We use the node graph of the ONNX model to build a node association tree for each pruned node. The attributes within these trees are assigned based on the operator type: the pruned node serves as the root, child nodes are tagged as 'next' indicating further exploration, and leaf nodes are marked as 'stop' indicating the end of the branch.
  • Figure 4: We illustrate four basic configurations of pruned and associated nodes, using convolution as a representative example for simplicity. We omit associated nodes like ReLU and pooling, which do not require processing in this context. (a) One-to-one represents the standard structure in DNNs, where the output channel of one layer feeds directly into the input channel of each filter in the next layer. (b) One-to-many is prevalent in models that incorporate feature reuse, such as SqueezeNet and Inception. (c) Many-to-one typifies the conventional residual structure found in networks. (d) Many-to-many combines the characteristics of (b) and (c), representing a hybrid structure that integrates feature reuse with residual connections.
  • Figure 5: Differences in filter index of ONNXPruner ($\ell_n$-norm) vs. $\ell_n$-norm.
  • ...and 1 more figures