BMRS: Bayesian Model Reduction for Structured Pruning

Dustin Wright; Christian Igel; Raghavendra Selvan

BMRS: Bayesian Model Reduction for Structured Pruning

Dustin Wright, Christian Igel, Raghavendra Selvan

TL;DR

BMRS introduces a threshold-free, end-to-end Bayesian framework for structured pruning by combining multiplicative-noise pruning with Bayesian model reduction. It derives two realizations, BMRS_N and BMRS_U, from distinct reduced priors (truncated log-normal and truncated log-uniform) to achieve high compression while preserving accuracy, without tuning pruning thresholds. The method provides closed-form, efficiently computable changes in evidence $\Delta F$ to decide pruning, and supports both post-training and continuous pruning modes. Empirical results across multiple datasets and architectures show competitive compression–accuracy trade-offs, with BMRS_N offering robust, threshold-free pruning and BMRS_U enabling more aggressive compression via a hyperparameter. These findings position BMRS as a principled tool for neural-network compression with potential for extension to hierarchical priors and broader structured elements.

Abstract

Modern neural networks are often massively overparameterized leading to high compute costs during training and at inference. One effective method to improve both the compute and energy efficiency of neural networks while maintaining good performance is structured pruning, where full network structures (e.g.~neurons or convolutional filters) that have limited impact on the model output are removed. In this work, we propose Bayesian Model Reduction for Structured pruning (BMRS), a fully end-to-end Bayesian method of structured pruning. BMRS is based on two recent methods: Bayesian structured pruning with multiplicative noise, and Bayesian model reduction (BMR), a method which allows efficient comparison of Bayesian models under a change in prior. We present two realizations of BMRS derived from different priors which yield different structured pruning characteristics: 1) BMRS_N with the truncated log-normal prior, which offers reliable compression rates and accuracy without the need for tuning any thresholds and 2) BMRS_U with the truncated log-uniform prior that can achieve more aggressive compression based on the boundaries of truncation. Overall, we find that BMRS offers a theoretically grounded approach to structured pruning of neural networks yielding both high compression rates and accuracy. Experiments on multiple datasets and neural networks of varying complexity showed that the two BMRS methods offer a competitive performance-efficiency trade-off compared to other pruning methods.

BMRS: Bayesian Model Reduction for Structured Pruning

TL;DR

to decide pruning, and supports both post-training and continuous pruning modes. Empirical results across multiple datasets and architectures show competitive compression–accuracy trade-offs, with BMRS_N offering robust, threshold-free pruning and BMRS_U enabling more aggressive compression via a hyperparameter. These findings position BMRS as a principled tool for neural-network compression with potential for extension to hierarchical priors and broader structured elements.

Abstract

Paper Structure (32 sections, 25 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 32 sections, 25 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Related work
Bayesian pruning.
Bayesian model reduction.
Problem formulation
Structured pruning with multiplicative noise and variational inference
Bayesian model reduction
Bayesian model reduction for structured pruning (BMRS)
Multiplicative noise layer
Deriving BMRS
BMRS with log-normal reduced prior (BMRS$_\mathcal{N}$)
BMRS with log-uniform reduced prior (BMRS$_\mathcal{U}$)
Training and pruning
Experiments
Post-training pruning.
...and 17 more sections

Figures (7)

Figure 1: BMRS uses BMR to perform structured pruning under multiplicative noise by calculating the change in log-evidence of noise variables $\theta$ under a prior which would shrink them to 0.
Figure 2: Accuracy vs. compression for post-training pruning on CIFAR10, Fashion-MNIST, and MNIST. The left plot in each subfigure shows the average accuracy across 10 seeds, shading shows the standard deviation. For BMRS, we mark the maximum compression rate based on when $\Delta F \ge 0$. The right plot in each subfigure shows a scatter plot and kernel density estimation of accuracy vs. compression of BMRS compared to SNR accuracy. BMRS$_\mathcal{N}$ and BMRS$_\mathcal{U}$-8 consistently stop pruning near the knee point, a preferred trade-off solution.
Figure 3: Average Spearman's rank correlation between the ranks of neurons for pruning when using different methods on CIFAR10 (plots for additional datasets are given in \ref{['sec:additional_plots']}).
Figure 4: Accuracy and compression rate vs. $p_{1}$ for BMRS$_\mathcal{U}$ on CIFAR10 with Lenet5. Results are averaged across 10 seeds with standard deviation indicated by the error bars.
Figure 5: Additional accuracy vs. compression results for post-training pruning including gradient-based pruning.
...and 2 more figures

BMRS: Bayesian Model Reduction for Structured Pruning

TL;DR

Abstract

BMRS: Bayesian Model Reduction for Structured Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)