Optimal Brain Apoptosis
Mingyuan Sun, Zheng Fang, Jiaxu Wang, Junjie Jiang, Delei Kong, Chenming Hu, Yuetong Fang, Renjing Xu
TL;DR
The paper addresses the challenge of making CNNs and Transformers more efficient by pruning redundant parameters. It introduces Optimal Brain Apoptosis (OBA), which directly computes the Hessian-vector product for each parameter to obtain the exact second-order Taylor expansion term $\frac{1}{2} \delta\theta^T H \delta\theta$, while exploiting layer-wise Hessian submatrix structure via series and parallel connectivity. A Jacobian-Vector Product Forward Propagation (JVPF) technique enables scalable, exact HV computations, supporting both structured and unstructured pruning. Empirical results on CNNs (e.g., VGG19, ResNet32, ResNet50) and ViT-B/16 across CIFAR-10/100 and ImageNet show competitive or superior accuracy vs. state-of-the-art pruning methods, with meaningful speedups and robust importance scoring; the approach is computationally efficient enough to be practical on large models, and the authors provide code for replication.
Abstract
The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromising performance. This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. By decomposing the Hessian matrix across network layers and identifying conditions under which inter-layer Hessian submatrices are non-zero, we propose a highly efficient technique for computing the second-order Taylor expansion of parameters. This approach allows for a more precise pruning process, particularly in the context of CNNs and Transformers, as validated in our experiments including VGG19, ResNet32, ResNet50, and ViT-B/16 on CIFAR10, CIFAR100 and Imagenet datasets. Our code is available at https://github.com/NEU-REAL/OBA.
