ONNXPruner: ONNX-Based General Model Pruning Adapter

Dongdong Ren; Wenbin Li; Tianyu Ding; Lei Wang; Qi Fan; Jing Huo; Hongbing Pan; Yang Gao

ONNXPruner: ONNX-Based General Model Pruning Adapter

Dongdong Ren, Wenbin Li, Tianyu Ding, Lei Wang, Qi Fan, Jing Huo, Hongbing Pan, Yang Gao

TL;DR

ONNXPruner introduces a general pruning adapter for ONNX models that abstracts pruning across frameworks by constructing node association trees for each pruned node and applying a tree-level evaluation to jointly assess pruned and associated channels. By standardizing models in ONNX and leveraging ONNX Runtime, it enables cross-framework deployment without extra architectural changes or retraining. The approach defines a four-type node attribute library and four pruning patterns (one-to-one, one-to-many, many-to-one, many-to-many) with explicit scoring formulas using channel-wise norms. Empirical results across CIFAR, ImageNet, and PASCAL VOC 2012 show that ONNXPruner can outperform several baselines, improve interoperability, and reduce integration overhead, highlighting its practical impact for scalable pruning workflows.

Abstract

Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process across diverse deep learning frameworks and hardware platforms. A novel aspect of ONNXPruner is its use of node association trees, which automatically adapt to various model architectures. These trees clarify the structural relationships between nodes, guiding the pruning process, particularly highlighting the impact on interconnected nodes. Furthermore, we introduce a tree-level evaluation method. By leveraging node association trees, this method allows for a comprehensive analysis beyond traditional single-node evaluations, enhancing pruning performance without the need for extra operations. Experiments across multiple models and datasets confirm ONNXPruner's strong adaptability and increased efficacy. Our work aims to advance the practical application of model pruning.

ONNXPruner: ONNX-Based General Model Pruning Adapter

TL;DR

Abstract

Paper Structure (16 sections, 5 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 5 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Model Conversion for Interoperability
Model Pruning Method
General Model Pruning Method
Method
ONNX Model Converter and Runtime
Construct Node Association Tree
Tree-level Pruning
Experiments
Settings
Effectiveness of Tree-level Pruning
Results on CIFAR
Results on ImageNet
Results on PASCAL VOC 2012
...and 1 more sections

Figures (6)

Figure 1: Illustration of pruning algorithms in development and deployment. Existing pruning algorithms (red boxes) are tailored for specific development frameworks and necessitate manual adaptation for various model structures. Our work introduces ONNXPruner (blue box), a versatile model pruning adapter for ONNX format models, which provides automatic adaptation of pruning algorithms to models with diverse structures, in-depending on the development framework used.
Figure 2: Comparison between the existing pruning framework (A) and the proposed pruning framework (B). (a) The current method fang2023depgraph recursively traverses each node to identify the associated nodes. (b) Evaluates filters based solely on the pruned node for pruning decisions. (c) Prunes associated nodes based on the evaluation in (b). (d) The proposed ONNXPruner constructs node association trees to represent the relationships between a pruned node and its associated nodes, presenting these relationships hierarchically. (e) Employs a tree-level pruning strategy, leveraging the node association tree to jointly evaluate and prune the filters of both the pruned node and associated nodes. (f) Prunes associated nodes following the evaluation in (e). CO and CI represent the output and input channels of the Conv kernel, respectively.
Figure 3: An example for constructing the node association trees. We use the node graph of the ONNX model to build a node association tree for each pruned node. The attributes within these trees are assigned based on the operator type: the pruned node serves as the root, child nodes are tagged as 'next' indicating further exploration, and leaf nodes are marked as 'stop' indicating the end of the branch.
Figure 4: We illustrate four basic configurations of pruned and associated nodes, using convolution as a representative example for simplicity. We omit associated nodes like ReLU and pooling, which do not require processing in this context. (a) One-to-one represents the standard structure in DNNs, where the output channel of one layer feeds directly into the input channel of each filter in the next layer. (b) One-to-many is prevalent in models that incorporate feature reuse, such as SqueezeNet and Inception. (c) Many-to-one typifies the conventional residual structure found in networks. (d) Many-to-many combines the characteristics of (b) and (c), representing a hybrid structure that integrates feature reuse with residual connections.
Figure 5: Differences in filter index of ONNXPruner ($\ell_n$-norm) vs. $\ell_n$-norm.
...and 1 more figures

ONNXPruner: ONNX-Based General Model Pruning Adapter

TL;DR

Abstract

ONNXPruner: ONNX-Based General Model Pruning Adapter

Authors

TL;DR

Abstract

Table of Contents

Figures (6)