Table of Contents
Fetching ...

Optimal Brain Apoptosis

Mingyuan Sun, Zheng Fang, Jiaxu Wang, Junjie Jiang, Delei Kong, Chenming Hu, Yuetong Fang, Renjing Xu

TL;DR

The paper addresses the challenge of making CNNs and Transformers more efficient by pruning redundant parameters. It introduces Optimal Brain Apoptosis (OBA), which directly computes the Hessian-vector product for each parameter to obtain the exact second-order Taylor expansion term $\frac{1}{2} \delta\theta^T H \delta\theta$, while exploiting layer-wise Hessian submatrix structure via series and parallel connectivity. A Jacobian-Vector Product Forward Propagation (JVPF) technique enables scalable, exact HV computations, supporting both structured and unstructured pruning. Empirical results on CNNs (e.g., VGG19, ResNet32, ResNet50) and ViT-B/16 across CIFAR-10/100 and ImageNet show competitive or superior accuracy vs. state-of-the-art pruning methods, with meaningful speedups and robust importance scoring; the approach is computationally efficient enough to be practical on large models, and the authors provide code for replication.

Abstract

The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromising performance. This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. By decomposing the Hessian matrix across network layers and identifying conditions under which inter-layer Hessian submatrices are non-zero, we propose a highly efficient technique for computing the second-order Taylor expansion of parameters. This approach allows for a more precise pruning process, particularly in the context of CNNs and Transformers, as validated in our experiments including VGG19, ResNet32, ResNet50, and ViT-B/16 on CIFAR10, CIFAR100 and Imagenet datasets. Our code is available at https://github.com/NEU-REAL/OBA.

Optimal Brain Apoptosis

TL;DR

The paper addresses the challenge of making CNNs and Transformers more efficient by pruning redundant parameters. It introduces Optimal Brain Apoptosis (OBA), which directly computes the Hessian-vector product for each parameter to obtain the exact second-order Taylor expansion term , while exploiting layer-wise Hessian submatrix structure via series and parallel connectivity. A Jacobian-Vector Product Forward Propagation (JVPF) technique enables scalable, exact HV computations, supporting both structured and unstructured pruning. Empirical results on CNNs (e.g., VGG19, ResNet32, ResNet50) and ViT-B/16 across CIFAR-10/100 and ImageNet show competitive or superior accuracy vs. state-of-the-art pruning methods, with meaningful speedups and robust importance scoring; the approach is computationally efficient enough to be practical on large models, and the authors provide code for replication.

Abstract

The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromising performance. This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. By decomposing the Hessian matrix across network layers and identifying conditions under which inter-layer Hessian submatrices are non-zero, we propose a highly efficient technique for computing the second-order Taylor expansion of parameters. This approach allows for a more precise pruning process, particularly in the context of CNNs and Transformers, as validated in our experiments including VGG19, ResNet32, ResNet50, and ViT-B/16 on CIFAR10, CIFAR100 and Imagenet datasets. Our code is available at https://github.com/NEU-REAL/OBA.

Paper Structure

This paper contains 36 sections, 5 theorems, 39 equations, 4 figures, 6 tables, 1 algorithm.

Key Result

Theorem 4.2

For layer $l$ in a neural network where layers $l_{\text{up}}\in\mathbf{L}_{\text{up}}$ and layers $l_{\text{low}}\in\mathbf{L}_{\text{low}}$ are in upper and lower series connectivity to layer $l$, respectively, then for weight parameter $w^{(l)}$ and bias parameter $b^{(l)}$ of layer $l$, we have in which $\hat{X}^{(l)}$ is given by and where $\mathbf{J}^{(l)}_{\delta W^{(l)}}\in\mathbb{R}^{(

Figures (4)

  • Figure 1: (a) An illustration of conditions where the Hessian matrix between parameters of two layers are nonzero. (b) An illustration of Jacobian-Vector Product Forward Propagation. Two forward propagation processes are needed for parameter layers and one forward propagation process is needed for nonparameter layers. For nonparameter layers we leverage Jacobian-vector product to conduct the forward process and do not need to calculate the Jacobian matrix explicitly.
  • Figure 2: Importance score of each neuron in a group is gathered from parameters of lower layers and upper layers.
  • Figure 3: Iterative pruning results on CIFAR10 and CIFAR100 with ResNet32.
  • Figure A1: The layer-wise FLOPs for all parameter layers from models pruned by different criteria.

Theorems & Definitions (12)

  • Definition 4.1: Series Connectivity
  • Theorem 4.2
  • Definition 4.3
  • Theorem 4.4
  • Lemma E.1
  • proof
  • Lemma E.2
  • proof
  • proof : Proof of \ref{['thm:series_connect']}
  • Lemma E.3
  • ...and 2 more