Table of Contents
Fetching ...

Efficient Model Compression for Bayesian Neural Networks

Diptarka Saha, Zihe Liu, Feng Liang

TL;DR

This work demonstrates a novel strategy to emulate principles of Bayesian model selection in a deep learning setup by obtaining the posterior inclusion probability for every node that typically gets lost in a fully connected Bayesian neural network.

Abstract

Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.

Efficient Model Compression for Bayesian Neural Networks

TL;DR

This work demonstrates a novel strategy to emulate principles of Bayesian model selection in a deep learning setup by obtaining the posterior inclusion probability for every node that typically gets lost in a fully connected Bayesian neural network.

Abstract

Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.

Paper Structure

This paper contains 30 sections, 2 theorems, 55 equations, 12 figures, 9 tables, 2 algorithms.

Key Result

Proposition 3.1

The objective function eq:masterOF can be alternatively expressed as up to a constant, where

Figures (12)

  • Figure 1: $I$ vs $\psi_1, \psi_2$ (Experiment 1). As the importance of $X_2$ goes from $0$ to $1$, its estimated relevance goes up while the relative importance of $X_1$ goes down simultaneously.
  • Figure 2: Correlation between Actual and Estimated Feature Importance. In all cases, correlations are high -- however, the linear data-generating processes have lower variability.
  • Figure 3: Correlation between Actual and Estimated Feature Importance. In all cases, correlations are high -- however, the linear data-generating processes have lower variability.
  • Figure 4: We vary $D$, the number of features. While keeping the following hyperparameters fixed: $\alpha = 2, f(x) = e^{|x|} -2x + \sin(2\pi x), \pi = 0.2$. We notice (Left) high selection accuracy and (Right)improved test errors in all such datasets.
  • Figure 5: We vary $\alpha$, the signal to noise ratio indicator. While keeping the following hyperparameters fixed fixed: $D = 100, f(x) = e^{|x|} -2x + \sin(2\pi x), \pi = 0.2$. We again notice (Left) high selection accuracy and (Right) improved test errors in all such datasets
  • ...and 7 more figures

Theorems & Definitions (5)

  • Proposition 3.1
  • Proposition 3.2
  • Remark 1
  • Remark 2
  • Remark 3