Table of Contents
Fetching ...

Bayesian Federated Model Compression for Communication and Computation Efficiency

Chengyu Xia, Danny H. K. Tsang, Vincent K. N. Lau

TL;DR

This work tackles the dual problem of communication and computation efficiency in federated learning by introducing a Bayesian approach with a hierarchical clustered sparsity prior. It develops a decentralized Turbo-VBI framework (D-Turbo-VBI) that combines SPMP-based message passing on an HMM prior with mean-field variational inference to jointly infer sparse weights across clients, while promoting a common sparse structure. The authors prove convergence to a stationary point under standard assumptions and demonstrate significant gains in communication reduction and local inference cost on CIFAR-10/100 benchmarks. The approach enables cluster-wise transmission and efficient computation through tiled, cluster-based operations, enabling scalable deployment in distributed settings.

Abstract

In this paper, we investigate Bayesian model compression in federated learning (FL) to construct sparse models that can achieve both communication and computation efficiencies. We propose a decentralized Turbo variational Bayesian inference (D-Turbo-VBI) FL framework where we firstly propose a hierarchical sparse prior to promote a clustered sparse structure in the weight matrix. Then, by carefully integrating message passing and VBI with a decentralized turbo framework, we propose the D-Turbo-VBI algorithm which can (i) reduce both upstream and downstream communication overhead during federated training, and (ii) reduce the computational complexity during local inference. Additionally, we establish the convergence property for thr proposed D-Turbo-VBI algorithm. Simulation results show the significant gain of our proposed algorithm over the baselines in reducing communication overhead during federated training and computational complexity of final model.

Bayesian Federated Model Compression for Communication and Computation Efficiency

TL;DR

This work tackles the dual problem of communication and computation efficiency in federated learning by introducing a Bayesian approach with a hierarchical clustered sparsity prior. It develops a decentralized Turbo-VBI framework (D-Turbo-VBI) that combines SPMP-based message passing on an HMM prior with mean-field variational inference to jointly infer sparse weights across clients, while promoting a common sparse structure. The authors prove convergence to a stationary point under standard assumptions and demonstrate significant gains in communication reduction and local inference cost on CIFAR-10/100 benchmarks. The approach enables cluster-wise transmission and efficient computation through tiled, cluster-based operations, enabling scalable deployment in distributed settings.

Abstract

In this paper, we investigate Bayesian model compression in federated learning (FL) to construct sparse models that can achieve both communication and computation efficiencies. We propose a decentralized Turbo variational Bayesian inference (D-Turbo-VBI) FL framework where we firstly propose a hierarchical sparse prior to promote a clustered sparse structure in the weight matrix. Then, by carefully integrating message passing and VBI with a decentralized turbo framework, we propose the D-Turbo-VBI algorithm which can (i) reduce both upstream and downstream communication overhead during federated training, and (ii) reduce the computational complexity during local inference. Additionally, we establish the convergence property for thr proposed D-Turbo-VBI algorithm. Simulation results show the significant gain of our proposed algorithm over the baselines in reducing communication overhead during federated training and computational complexity of final model.
Paper Structure (13 sections, 1 theorem, 27 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 1 theorem, 27 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Figures (4)

  • Figure 1: (a) Proposed clustered sparse structure on a $6\times6$ weight matrix. Non-zero elements are marked in blue. (b) HMM prior for $p\left(\mathbf{s}\right)$ on a $6\times6$ weight matrix. Non-zero supports will gather in clusters due to the correlation between neighbors.
  • Figure 2: Illustration of the decentralized Turbo-VBI FL framework.
  • Figure 3: Accuracy versus the total communication bits in the first 100 communication rounds.
  • Figure 4: Message $v_{h\rightarrow s_{n}}$ of Dense_2 layer in different iterations.

Theorems & Definitions (1)

  • Theorem 1