Table of Contents
Fetching ...

Training-Free Restoration of Pruned Neural Networks

Keonho Lee, Minsoo Kim, Dong-Wan Choi

TL;DR

This work tackles restoring pruned CNNs without access to training data or additional fine-tuning. It introduces Leave Before You Leave (LBYL), a data-free recovery strategy that distributes the information of each pruned filter across multiple preserved filters via a delivery matrix, leading to a data-free reconstruction loss with a convex, closed-form solution. The reconstruction error decomposes into Residual Error, Batch Normalization Error, and Activation Error, and is minimized with a regularized least-squares solution that yields a unique optimum for the recovery coefficients. Empirically, LBYL consistently surpasses one-to-one compensation (NM) and other data-free baselines across CIFAR-10/100, ImageNet, and COCO, including transfer scenarios, demonstrating improved reconstruction quality and practical utility when data or fine-tuning are unavailable.

Abstract

Although network pruning has been highly popularized to compress deep neural networks, its resulting accuracy heavily depends on a fine-tuning process that is often computationally expensive and requires the original data. However, this may not be the case in real-world scenarios, and hence a few recent works attempt to restore pruned networks without any expensive retraining process. Their strong assumption is that every neuron being pruned can be replaced with another one quite similar to it, but unfortunately this does not hold in many neural networks, where the similarity between neurons is extremely low in some layers. In this article, we propose a more rigorous and robust method of restoring pruned networks in a fine-tuning free and data-free manner, called LBYL (Leave Before You Leave). LBYL significantly relaxes the aforementioned assumption in a way that each pruned neuron leaves its pieces of information to as many preserved neurons as possible and thereby multiple neurons together obtain a more robust approximation to the original output of the neuron who just left. Our method is based on a theoretical analysis on how to formulate the reconstruction error between the original network and its approximation, which nicely leads to a closed form solution for our derived loss function. Through the extensive experiments, LBYL is confirmed to be indeed more effective to approximate the original network and consequently able to achieve higher accuracy for restored networks, compared to the recent approaches exploiting the similarity between two neurons. The very first version of this work, which contains major technical and theoretical components, was submitted to NeurIPS 2021 and ICML 2022.

Training-Free Restoration of Pruned Neural Networks

TL;DR

This work tackles restoring pruned CNNs without access to training data or additional fine-tuning. It introduces Leave Before You Leave (LBYL), a data-free recovery strategy that distributes the information of each pruned filter across multiple preserved filters via a delivery matrix, leading to a data-free reconstruction loss with a convex, closed-form solution. The reconstruction error decomposes into Residual Error, Batch Normalization Error, and Activation Error, and is minimized with a regularized least-squares solution that yields a unique optimum for the recovery coefficients. Empirically, LBYL consistently surpasses one-to-one compensation (NM) and other data-free baselines across CIFAR-10/100, ImageNet, and COCO, including transfer scenarios, demonstrating improved reconstruction quality and practical utility when data or fine-tuning are unavailable.

Abstract

Although network pruning has been highly popularized to compress deep neural networks, its resulting accuracy heavily depends on a fine-tuning process that is often computationally expensive and requires the original data. However, this may not be the case in real-world scenarios, and hence a few recent works attempt to restore pruned networks without any expensive retraining process. Their strong assumption is that every neuron being pruned can be replaced with another one quite similar to it, but unfortunately this does not hold in many neural networks, where the similarity between neurons is extremely low in some layers. In this article, we propose a more rigorous and robust method of restoring pruned networks in a fine-tuning free and data-free manner, called LBYL (Leave Before You Leave). LBYL significantly relaxes the aforementioned assumption in a way that each pruned neuron leaves its pieces of information to as many preserved neurons as possible and thereby multiple neurons together obtain a more robust approximation to the original output of the neuron who just left. Our method is based on a theoretical analysis on how to formulate the reconstruction error between the original network and its approximation, which nicely leads to a closed form solution for our derived loss function. Through the extensive experiments, LBYL is confirmed to be indeed more effective to approximate the original network and consequently able to achieve higher accuracy for restored networks, compared to the recent approaches exploiting the similarity between two neurons. The very first version of this work, which contains major technical and theoretical components, was submitted to NeurIPS 2021 and ICML 2022.

Paper Structure

This paper contains 26 sections, 4 theorems, 33 equations, 5 figures, 12 tables, 1 algorithm.

Key Result

Lemma 1

If there is only batch normalization between a feature map and its activation map, the reconstruction error can be formulated as follows: where $\boldsymbol{\mathcal{B}} = \frac{\gamma_{j}}{\sigma_{j}} \{\sum\limits_{k = 1, k \neq j }^{m} s_{k}\frac{\sigma_{j}}{\gamma_{j}}\frac{\gamma_{k}}{\sigma_{k}} (\mu_{k} - \frac{\sigma_{k}}{\gamma_{k}}\beta_k) - \mu_j + \frac{\sigma_{j}}{\gamma_{j}}\beta_j

Figures (5)

  • Figure 1: The conceptual overview of our LBYL method, showing how the original output resulting from a pruned filter at $\ell$-th layer, that is, the output of $(\ell+1)$-th convolutional layer, can be recovered by all the other preserved filters at the same layer (i.e., $\ell$-th layer) through convolution, batch normalization, and activation function (i.e., ReLU), where $s, s',$ and $s^*$ are the coefficients that quantify how much each preserved filter should carry the information of the pruned filter.
  • Figure 2: Comparison between pruning matrix and delivery matrix, where the $4$-th and $6$-th filters are being pruned among $6$ original filters
  • Figure 3: A neuron pruning scenario in fully-connected layers
  • Figure 4: Comparison on the three error components with NM NM, where each $m$_$n\_k$ in the x-axis represents the $k$-th conv module in the $n$-th block at the $m$-th layer in ResNet-50 and when pruning the first and second convolution layers of each block by 30%
  • Figure 5: Comparison on learning curves of fine-tuning restored networks for 20 epochs and that of training the same-sized small architecture from scratch for 80 epochs at different pruning ratios

Theorems & Definitions (14)

  • Definition 1
  • Lemma 1
  • proof
  • Definition 2
  • Theorem 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • ...and 4 more