Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

Jim Beckers; Bart van Erp; Ziyue Zhao; Kirill Kondrashov; Bert de Vries

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

Jim Beckers, Bart van Erp, Ziyue Zhao, Kirill Kondrashov, Bert de Vries

TL;DR

A novel iterative pruning algorithm is presented to alleviate the problems arising with naive Bayesian model reduction, and solves the shortcomings of current state-of-the-art pruning methods that are used by the signal processing community.

Abstract

Bayesian model reduction provides an efficient approach for comparing the performance of all nested sub-models of a model, without re-evaluating any of these sub-models. Until now, Bayesian model reduction has been applied mainly in the computational neuroscience community on simple models. In this paper, we formulate and apply Bayesian model reduction to perform principled pruning of Bayesian neural networks, based on variational free energy minimization. Direct application of Bayesian model reduction, however, gives rise to approximation errors. Therefore, a novel iterative pruning algorithm is presented to alleviate the problems arising with naive Bayesian model reduction, as supported experimentally on the publicly available UCI datasets for different inference algorithms. This novel parameter pruning scheme solves the shortcomings of current state-of-the-art pruning methods that are used by the signal processing community. The proposed approach has a clear stopping criterion and minimizes the same objective that is used during training. Next to these benefits, our experiments indicate better model performance in comparison to state-of-the-art pruning schemes.

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

TL;DR

Abstract

Paper Structure (16 sections, 24 equations, 3 figures, 1 table)

This paper contains 16 sections, 24 equations, 3 figures, 1 table.

Introduction
Model specification
Probabilistic inference
Parameter pruning
Experiments
Experimental setup
Divergence assessment
Robustness and performance
Comparison to state-of-the-art
Related Work
Probabilistic inference
Model compression
Discussion and future work
Conclusion
Bayesian model reduction derivations
...and 1 more sections

Figures (3)

Figure 1: An overview of the pruning objectives signal-to-noise ratio, signal-plus-robustness and Bayesian model reduction. The prior distribution was set to $p(\theta) = \mathcal{N}(\theta {\,|\,} 0, 1)$ and the new prior to $\tilde{p}(\theta) = \mathcal{N}(\theta {\,|\,} 0, \varepsilon)$, with $\varepsilon=10^{-16}$. The pruning objectives have been computed with respect to the variational posterior distribution $q(\theta) = \mathcal{N}(\theta {\,|\,} \hat{\mu}_\theta, \hat{\sigma}^2_\theta)$. All three objectives prune from low to high values. Only Bayesian model reduction has a clear stopping criterion, located at zero, as indicated by the red contour in the right plot. The interior of this contour is subject to pruning.
Figure 2: Estimated and actual variational free energy for different pruning rates for the boston dataset. The different plots illustrate the different inference algorithms: variance backpropagation haussmann_sampling-free_2019 and Bayes-by-backprop blundell_weight_2015 with global kingma_auto-encoding_2014 and local kingma_variational_2015 reparameterization. The dashed line denotes the stopping criterion for a single pruning iteration. In all plots, the estimated and actual optimal pruning rates differ, establishing the need of Algorithm \ref{['alg:bmr']}.
Figure 3: Comparison of the signal-to-noise ratio (SNR), signal-plus-robustness (SPR) and Bayesian model reduction (BMR) pruning metrics on the boston dataset. The different plots illustrate the different inference algorithms: variance backpropagation haussmann_sampling-free_2019 and Bayes-by-backprop blundell_weight_2015 with global kingma_auto-encoding_2014 and local kingma_variational_2015 reparameterization. Solid lines corresponds to the variational free energy and dashed lines to the negative accuracy from \ref{['eq:VFE-complexity']}. Their difference corresponds to the complexity term in \ref{['eq:VFE-complexity']}. The different colors denote the different pruning methods.

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

TL;DR

Abstract

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

Authors

TL;DR

Abstract

Table of Contents

Figures (3)