Table of Contents
Fetching ...

Bayesian Deep Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

Wei Xu, An Liu, Yiting Zhang, Vincent Lau

TL;DR

The paper addresses the challenges of training deep neural networks with efficient uncertainty-aware learning and structured model compression. It introduces EM-TDAMP, a turbo deep approximate message passing–based EM framework that performs the E-step via TDAMP and the M-step to update hyperparameters, incorporating a group sparse prior to prune neurons. A Bayesian federated learning extension is developed, where local TDAMP posteriors are aggregated at a central server using a weighted geometric average for $oldsymbol{ heta}$ and a product for $oldsymbol{z}_{L}$, followed by EM updates to $oldsymbol{ ho}$ and $v$. Empirical results on Boston housing and MNIST demonstrate faster convergence, better test performance, and substantial sparsity, with reduced communication rounds in the federated setting, highlighting the method’s practicality for scalable Bayesian deep learning and compression.

Abstract

Efficient learning and model compression algorithm for deep neural network (DNN) is a key workhorse behind the rise of deep learning (DL). In this work, we propose a message passing based Bayesian deep learning algorithm called EM-TDAMP to avoid the drawbacks of traditional stochastic gradient descent (SGD) based learning algorithms and regularization-based model compression methods. Specifically, we formulate the problem of DNN learning and compression as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an expectation maximization (EM) framework to estimate posterior distributions for parameters (E-step) and update hyperparameters (M-step), where the E-step is realized by a newly proposed turbo deep approximate message passing (TDAMP) algorithm. We further extend the EM-TDAMP and propose a novel Bayesian federated learning framework, in which and the clients perform TDAMP to efficiently calculate the local posterior distributions based on the local data, and the central server first aggregates the local posterior distributions to update the global posterior distributions and then update hyperparameters based on EM to accelerate convergence. We detail the application of EM-TDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EM-TDAMP.

Bayesian Deep Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

TL;DR

The paper addresses the challenges of training deep neural networks with efficient uncertainty-aware learning and structured model compression. It introduces EM-TDAMP, a turbo deep approximate message passing–based EM framework that performs the E-step via TDAMP and the M-step to update hyperparameters, incorporating a group sparse prior to prune neurons. A Bayesian federated learning extension is developed, where local TDAMP posteriors are aggregated at a central server using a weighted geometric average for and a product for , followed by EM updates to and . Empirical results on Boston housing and MNIST demonstrate faster convergence, better test performance, and substantial sparsity, with reduced communication rounds in the federated setting, highlighting the method’s practicality for scalable Bayesian deep learning and compression.

Abstract

Efficient learning and model compression algorithm for deep neural network (DNN) is a key workhorse behind the rise of deep learning (DL). In this work, we propose a message passing based Bayesian deep learning algorithm called EM-TDAMP to avoid the drawbacks of traditional stochastic gradient descent (SGD) based learning algorithms and regularization-based model compression methods. Specifically, we formulate the problem of DNN learning and compression as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an expectation maximization (EM) framework to estimate posterior distributions for parameters (E-step) and update hyperparameters (M-step), where the E-step is realized by a newly proposed turbo deep approximate message passing (TDAMP) algorithm. We further extend the EM-TDAMP and propose a novel Bayesian federated learning framework, in which and the clients perform TDAMP to efficiently calculate the local posterior distributions based on the local data, and the central server first aggregates the local posterior distributions to update the global posterior distributions and then update hyperparameters based on EM to accelerate convergence. We detail the application of EM-TDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EM-TDAMP.
Paper Structure (37 sections, 66 equations, 12 figures, 2 tables, 2 algorithms)

This paper contains 37 sections, 66 equations, 12 figures, 2 tables, 2 algorithms.

Figures (12)

  • Figure 1: Illustration for group sparsity, where we show elements in the $l$-th layer. The gray elements are preserved, while white elements are set to zeros. In the figure, the 2-nd input neuron and 6-th input neuron are deactivated because the related weight columns are set to zeros.
  • Figure 2: The structure of $\mathcal{G}_{r}$ ($r=1,\ldots,R$). The specific expression of factor nodes are summarized in Table\ref{['tab:Factor-Distri-func']}.
  • Figure 3: Turbo framework factor graph related to $\boldsymbol{W}_{l,n}$.
  • Figure 4: Detailed structure of the $l$-th layer related to the $i$-th sample, where we set $N_{l}=2,N_{l-1}=3$. The specific expressions of factor nodes are summarized in Table\ref{['tab:Factor-Distri-func-1']}.
  • Figure 5: Illustration for federated learning framework, where $f_{k}\left(\boldsymbol{\theta}\right),g_{k}\left(\boldsymbol{\theta}\right)$ represents $p\left(\boldsymbol{z}_{L}^{k}|\boldsymbol{D}_{x}^{k},\boldsymbol{\theta}\right)$ and $p\left(\boldsymbol{D}_{y}^{k}|\boldsymbol{z}_{L}^{k}\right)$, respectively for $k=1,\ldots,K$.
  • ...and 7 more figures