Efficient Model Compression Techniques with FishLeg

Jamie McGowan; Wei Sheng Lai; Weibin Chen; Henry Aldridge; Jools Clarke; Jezabel Garcia; Rui Xia; Yilei Liang; Guillaume Hennequin; Alberto Bernacchia

Efficient Model Compression Techniques with FishLeg

Jamie McGowan, Wei Sheng Lai, Weibin Chen, Henry Aldridge, Jools Clarke, Jezabel Garcia, Rui Xia, Yilei Liang, Guillaume Hennequin, Alberto Bernacchia

TL;DR

The paper tackles the challenge of pruning large neural networks efficiently by exploiting second-order information without prohibitive memory costs. It introduces FishLeg Surgeon (FLS), which uses the FishLeg optimizer to meta-learn a parametric inverse Fisher F_gamma^{-1} ≈ Q(lambda) and to update curvature estimates online during pruning, avoiding full re-computation. Key contributions include a memory-efficient block-diagonal parameterization for Q, initialization and preconditioning strategies that speed convergence, and empirical validation showing improved accuracy at high sparsity on ResNet18/CIFAR-10 and TinyIM. This approach integrates second-order optimization with model compression, enabling scalable, accurate pruning suitable for resource-constrained deployments.

Abstract

In many domains, the most successful AI models tend to be the largest, indeed often too large to be handled by AI players with limited computational resources. To mitigate this, a number of compression methods have been developed, including methods that prune the network down to high sparsity whilst retaining performance. The best-performing pruning techniques are often those that use second-order curvature information (such as an estimate of the Fisher information matrix) to score the importance of each weight and to predict the optimal compensation for weight deletion. However, these methods are difficult to scale to high-dimensional parameter spaces without making heavy approximations. Here, we propose the FishLeg surgeon (FLS), a new second-order pruning method based on the Fisher-Legendre (FishLeg) optimizer. At the heart of FishLeg is a meta-learning approach to amortising the action of the inverse FIM, which brings a number of advantages. Firstly, the parameterisation enables the use of flexible tensor factorisation techniques to improve computational and memory efficiency without sacrificing much accuracy, alleviating challenges associated with scalability of most second-order pruning methods. Secondly, directly estimating the inverse FIM leads to less sensitivity to the amplification of stochasticity during inversion, thereby resulting in more precise estimates. Thirdly, our approach also allows for progressive assimilation of the curvature into the parameterisation. In the gradual pruning regime, this results in a more efficient estimate refinement as opposed to re-estimation. We find that FishLeg achieves higher or comparable performance against two common baselines in the area, most notably in the high sparsity regime when considering a ResNet18 model on CIFAR-10 (84% accuracy at 95% sparsity vs 60% for OBS) and TinyIM (53% accuracy at 80% sparsity vs 48% for OBS).

Efficient Model Compression Techniques with FishLeg

TL;DR

Abstract

Efficient Model Compression Techniques with FishLeg

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)