Table of Contents
Fetching ...

FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification

Hui Chen, Hengyu Liu, Zhangkai Wu, Xuhui Fan, Longbing Cao

TL;DR

FedSI addresses uncertainty quantification in federated learning under non-IID data by performing posterior inference on a client-specific subnetwork within the representation layers, while keeping the rest of the network deterministic. It leverages Linearized Laplace Approximation to obtain a full-covariance Gaussian posterior over a small subnetwork, identified by a Wasserstein-distance criterion that prioritizes high-variance parameters. Local updates compute a MAP estimate for the full representation, then infer a subnetwork posterior via GGN-Laplace, followed by a global aggregation that combines stochastic and deterministic components to learn a shared representation. Experiments on MNIST, FMNIST, and CIFAR-10 show that FedSI outperforms both Bayesian and non-Bayesian FL baselines in heterogeneous settings and can generalize to novel clients with low overhead.

Abstract

While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In this paper, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific subnetwork inference mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on three different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios.

FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification

TL;DR

FedSI addresses uncertainty quantification in federated learning under non-IID data by performing posterior inference on a client-specific subnetwork within the representation layers, while keeping the rest of the network deterministic. It leverages Linearized Laplace Approximation to obtain a full-covariance Gaussian posterior over a small subnetwork, identified by a Wasserstein-distance criterion that prioritizes high-variance parameters. Local updates compute a MAP estimate for the full representation, then infer a subnetwork posterior via GGN-Laplace, followed by a global aggregation that combines stochastic and deterministic components to learn a shared representation. Experiments on MNIST, FMNIST, and CIFAR-10 show that FedSI outperforms both Bayesian and non-Bayesian FL baselines in heterogeneous settings and can generalize to novel clients with low overhead.

Abstract

While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In this paper, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific subnetwork inference mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on three different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios.
Paper Structure (18 sections, 26 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 26 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Personalized Federated Learning with Subnetwork Inference. After obtaining the MAP values, each client identifies its own subnetwork ${{\boldsymbol{\mathbf{\theta}}}}_{i,S}$ and obtains its corresponding full-covariance Gaussian posterior $q({{\boldsymbol{\mathbf{\theta}}}}_{i,S})$ through subnetwork inference (SI). Note that the decision parameters ${{\boldsymbol{\mathbf{\phi}}}}_i$ are fixed at their initial random values during this phase. Then, clients send the distribution parameters of representation parameters $\boldsymbol{\mu}^{t+1}_{\theta_i}$, $\boldsymbol{\sigma}^{t+1}_{\theta_i}$ to the server, which averages them to compute the distribution parameters of common representation parameters $\boldsymbol{\mu}^{t+1}_{\theta}$, $\boldsymbol{\sigma}^{t+1}_{\theta}$ for the next communication round.
  • Figure 2: Global aggregation. Before averaging: Each participating client sends the server with the updated distribution parameters of stochastic parameters $\boldsymbol{\mu}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_{i,S}}$, $\boldsymbol{\sigma}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_{i,S}}$ and deterministic parameters $\boldsymbol{\mu}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_{i,D}}$, $\boldsymbol{\sigma}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_{i,D}}$. The element of ${{\boldsymbol{\mathbf{\theta}}}}^{t+1}_{i,D}$ can be regarded as a degenerate Gaussian distribution for model averaging. After averaging: The server obtains the distribution parameters of stochastic common representation parameters $\boldsymbol{\mu}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_S}$, $\boldsymbol{\sigma}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_S}$ and deterministic common representation parameters $\boldsymbol{\mu}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_D}$, $\boldsymbol{\sigma}^{t+1}_{{{\boldsymbol{\mathbf{\theta}}}}_D}$. Note that the ${{\boldsymbol{\mathbf{\theta}}}}^{t+1}_D$ is transformed into stochastic parameters following a Gaussian distribution with a covariance matrix of $\alpha \mathbf{I}$.
  • Figure 3: Test accuracy comparison with varying ratios of the subnetwork for MLP and CNN.
  • Figure 4: Reliability diagram and confidence histogram of FedSI on MNIST (left), FMNIST (middle) and CIFAR-10 (right). The closer the accuracy line and the average confidence line are, the better the model is calibrated.
  • Figure 5: Performance comparison of different DNNs-based PFL algorithms on MNIST, FMNIST and CIFAR-10.
  • ...and 1 more figures