Table of Contents
Fetching ...

PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

Shiguang Wang, Tao Xie, Haijun Liu, Xingcheng Zhang, Jian Cheng

TL;DR

PSE-Net tackles the high computational cost of channel-pruning via a parallel-subnets estimator that trains multiple subnets simultaneously in a single forward-backward pass, enabling rapid supernet updates and accurate subnet interpolation. It couples this with a prior-distributed sampling strategy, derived from supernet training losses, to efficiently explore FLOPs-constrained architectures. The method achieves state-of-the-art or competitive accuracy on ImageNet across MobileNetV2, ResNet50, and VGG, while significantly reducing supernet training time and improving search efficiency. The approach also shows positive transfer to downstream tasks like object detection and semantic segmentation, and yields favorable hardware latency results compared to prior pruning methods.

Abstract

Channel Pruning is one of the most widespread techniques used to compress deep neural networks while maintaining their performances. Currently, a typical pruning algorithm leverages neural architecture search to directly find networks with a configurable width, the key step of which is to identify representative subnet for various pruning ratios by training a supernet. However, current methods mainly follow a serial training strategy to optimize supernet, which is very time-consuming. In this work, we introduce PSE-Net, a novel parallel-subnets estimator for efficient channel pruning. Specifically, we propose a parallel-subnets training algorithm that simulate the forward-backward pass of multiple subnets by droping extraneous features on batch dimension, thus various subnets could be trained in one round. Our proposed algorithm facilitates the efficiency of supernet training and equips the network with the ability to interpolate the accuracy of unsampled subnets, enabling PSE-Net to effectively evaluate and rank the subnets. Over the trained supernet, we develop a prior-distributed-based sampling algorithm to boost the performance of classical evolutionary search. Such algorithm utilizes the prior information of supernet training phase to assist in the search of optimal subnets while tackling the challenge of discovering samples that satisfy resource constraints due to the long-tail distribution of network configuration. Extensive experiments demonstrate PSE-Net outperforms previous state-of-the-art channel pruning methods on the ImageNet dataset while retaining superior supernet training efficiency. For example, under 300M FLOPs constraint, our pruned MobileNetV2 achieves 75.2% Top-1 accuracy on ImageNet dataset, exceeding the original MobileNetV2 by 2.6 units while only cost 30%/16% times than BCNet/AutoAlim.

PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

TL;DR

PSE-Net tackles the high computational cost of channel-pruning via a parallel-subnets estimator that trains multiple subnets simultaneously in a single forward-backward pass, enabling rapid supernet updates and accurate subnet interpolation. It couples this with a prior-distributed sampling strategy, derived from supernet training losses, to efficiently explore FLOPs-constrained architectures. The method achieves state-of-the-art or competitive accuracy on ImageNet across MobileNetV2, ResNet50, and VGG, while significantly reducing supernet training time and improving search efficiency. The approach also shows positive transfer to downstream tasks like object detection and semantic segmentation, and yields favorable hardware latency results compared to prior pruning methods.

Abstract

Channel Pruning is one of the most widespread techniques used to compress deep neural networks while maintaining their performances. Currently, a typical pruning algorithm leverages neural architecture search to directly find networks with a configurable width, the key step of which is to identify representative subnet for various pruning ratios by training a supernet. However, current methods mainly follow a serial training strategy to optimize supernet, which is very time-consuming. In this work, we introduce PSE-Net, a novel parallel-subnets estimator for efficient channel pruning. Specifically, we propose a parallel-subnets training algorithm that simulate the forward-backward pass of multiple subnets by droping extraneous features on batch dimension, thus various subnets could be trained in one round. Our proposed algorithm facilitates the efficiency of supernet training and equips the network with the ability to interpolate the accuracy of unsampled subnets, enabling PSE-Net to effectively evaluate and rank the subnets. Over the trained supernet, we develop a prior-distributed-based sampling algorithm to boost the performance of classical evolutionary search. Such algorithm utilizes the prior information of supernet training phase to assist in the search of optimal subnets while tackling the challenge of discovering samples that satisfy resource constraints due to the long-tail distribution of network configuration. Extensive experiments demonstrate PSE-Net outperforms previous state-of-the-art channel pruning methods on the ImageNet dataset while retaining superior supernet training efficiency. For example, under 300M FLOPs constraint, our pruned MobileNetV2 achieves 75.2% Top-1 accuracy on ImageNet dataset, exceeding the original MobileNetV2 by 2.6 units while only cost 30%/16% times than BCNet/AutoAlim.
Paper Structure (17 sections, 20 equations, 4 figures, 10 tables)

This paper contains 17 sections, 20 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Different types of mechanisms for modeling estimator. (a). Thousands of distinct models are trained separately. (b) and (c) are two types of weight-sharing supernet mechanisms adopted by most one-shot NAS methods. Suppose we have three blocks with $d-k$, $d$, and $d+k$ channels, separately, the classical weight-sharing supernet (b) will separately build weights for these blocks, while the weight entanglement supernet will entangle these blocks with super weights. We denote the chosen parts in solid lines while the unchosen parts are in dashed lines. One-shot NAS methods like SPOS guo2020single, FairNAS chu2021fairnas, Cream peng2020cream adopt the modeling mechanism (b), while AutoSlim yu2019autoslim, BigNAS yu2020bignas, BCNet su2021bcnet, and Ours adopt the modeling mechanism (c).
  • Figure 2: The overall framework of PSE-Net. We propose a parallel-subnets training algorithm and prior-distributed-based sampling algorithm to boost supernet training and subnet searching, respectively. Comparison of serial training strategy and parallel-subnets training algorithm during the supernet training phase is illustrated in (a) and (b). (a) Serial training strategy is approximately equal to training a plain model $n$ times, thus slowing the training efficiency of the supernet. (b) Parallel-subnets training algorithm that always holds the supernet and drops extraneous features on batch dimension to simulate multiple subnets forward-backward in one round. We denote the chosen parts in solid lines while the unchosen parts are in dashed lines. (c) Apart from introducing the computation cost for improving sampling efficiency, we also adopt the training loss that reflecting the quality of subnets to improve subnet sampling performance.
  • Figure 3: Visualization of the searched networks relative to different FLOPs. The abscissa represents the serial number of the convolutional layer. For each layer, the vertical axis represents the ratio of maintained channels to those of the original networks.
  • Figure 4: (a) The average number of trials to sample architectures under constraints. (b) Accuracy performance of the searched PSE-Net with different sampling methods.