Table of Contents
Fetching ...

Physics Inspired Criterion for Pruning-Quantization Joint Learning

Weiying Xie, Xiaoyi Fan, Xin Zhang, Yunsong Li, Jie Lei, Leyuan Fang

TL;DR

This work tackles the challenge of deploying CNNs on resource-constrained devices by proposing PIC-PQ, a physics-inspired criterion for pruning-quantization joint learning. By drawing an analogy to elasticity dynamics, it defines a global filter importance through the relation $I_{i}^{l(i)} = a_{l(i)} \cdot FP(\boldsymbol{\Theta_{i}^{l(i)}}) + b_{l(i)}$ with $FP(\boldsymbol{\Theta_{i}^{l(i)}}) = R(\mathbf{o}_{i}^{l(i)}(D))$, enabling cross-layer ranking via a learned shift $b_{l(i)}$ and a module-internal deformation scale $a_{l(i)}$. The method derives a theoretically grounded objective and computes automatic per-layer bitwidths using layer sparsity and a hardware-aware penalty, solved efficiently with a Regularized Evolutionary Algorithm. Empirically, PIC-PQ delivers substantial BOPs reductions (e.g., up to ~54× on CIFAR10 with minimal accuracy loss and ~53× on ImageNet) while maintaining competitive or superior accuracy versus state-of-the-art methods, indicating strong potential for practical, interpretable model compression. The approach thus offers a principled, hardware-conscious path to jointly prune and quantize networks with interpretable global rankings and automatic bitwidth allocation.

Abstract

Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices. However, most existing methods do not jointly learn a global criterion for pruning and quantization in an interpretable way. In this paper, we propose a novel physics inspired criterion for pruning-quantization joint learning (PIC-PQ), which is explored from an analogy we first draw between elasticity dynamics (ED) and model compression (MC). Specifically, derived from Hooke's law in ED, we establish a linear relationship between the filters' importance distribution and the filter property (FP) by a learnable deformation scale in the physics inspired criterion (PIC). Furthermore, we extend PIC with a relative shift variable for a global view. To ensure feasibility and flexibility, available maximum bitwidth and penalty factor are introduced in quantization bitwidth assignment. Experiments on benchmarks of image classification demonstrate that PIC-PQ yields a good trade-off between accuracy and bit-operations (BOPs) compression ratio e.g., 54.96X BOPs compression ratio in ResNet56 on CIFAR10 with 0.10% accuracy drop and 53.24X in ResNet18 on ImageNet with 0.61% accuracy drop). The code will be available at https://github.com/fanxxxxyi/PIC-PQ.

Physics Inspired Criterion for Pruning-Quantization Joint Learning

TL;DR

This work tackles the challenge of deploying CNNs on resource-constrained devices by proposing PIC-PQ, a physics-inspired criterion for pruning-quantization joint learning. By drawing an analogy to elasticity dynamics, it defines a global filter importance through the relation with , enabling cross-layer ranking via a learned shift and a module-internal deformation scale . The method derives a theoretically grounded objective and computes automatic per-layer bitwidths using layer sparsity and a hardware-aware penalty, solved efficiently with a Regularized Evolutionary Algorithm. Empirically, PIC-PQ delivers substantial BOPs reductions (e.g., up to ~54× on CIFAR10 with minimal accuracy loss and ~53× on ImageNet) while maintaining competitive or superior accuracy versus state-of-the-art methods, indicating strong potential for practical, interpretable model compression. The approach thus offers a principled, hardware-conscious path to jointly prune and quantize networks with interpretable global rankings and automatic bitwidth allocation.

Abstract

Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices. However, most existing methods do not jointly learn a global criterion for pruning and quantization in an interpretable way. In this paper, we propose a novel physics inspired criterion for pruning-quantization joint learning (PIC-PQ), which is explored from an analogy we first draw between elasticity dynamics (ED) and model compression (MC). Specifically, derived from Hooke's law in ED, we establish a linear relationship between the filters' importance distribution and the filter property (FP) by a learnable deformation scale in the physics inspired criterion (PIC). Furthermore, we extend PIC with a relative shift variable for a global view. To ensure feasibility and flexibility, available maximum bitwidth and penalty factor are introduced in quantization bitwidth assignment. Experiments on benchmarks of image classification demonstrate that PIC-PQ yields a good trade-off between accuracy and bit-operations (BOPs) compression ratio e.g., 54.96X BOPs compression ratio in ResNet56 on CIFAR10 with 0.10% accuracy drop and 53.24X in ResNet18 on ImageNet with 0.61% accuracy drop). The code will be available at https://github.com/fanxxxxyi/PIC-PQ.
Paper Structure (23 sections, 21 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 21 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: The analogy between the optimization in ED and MC.
  • Figure 2: Overview of PIC-PQ framework. (a) Step 1 first shows a visual analogy between elastomers' deformation in ED and filters' importance distribution in MC. And it establishes a linear relationship that the filters' importance distribution is linearly related to FP. In step 2, a relative shift variable is further introduced to rank filters cross-layers globally. (b) Then, the rank of the feature map from each filter is generated for obtaining the global importance ranking with the optimal $\boldsymbol{a-b}$ pairs searched before. (c) Finally, compression policy is assigned automatically.
  • Figure 3: Lipshitz bound on the function. $k$ denotes the slope.
  • Figure 4: Two-dimensional feature representation of the inner layers of the ResNet56 network trained on CIFAR10. (a) Conv1 (first layer before Block 1) (b) Conv19 (last layer of Block 1) (c) Conv37 (last layer of Block 2) (d) Conv55 (last layer of Block 3).
  • Figure 5: Influence of the $\boldsymbol{\tau}$ when compressing ResNet56 on CIFAR100.
  • ...and 1 more figures