Table of Contents
Fetching ...

Personalized Federated Learning of Probabilistic Models: A PAC-Bayesian Approach

Mahrokh Ghoddousi Boroujeni, Andreas Krause, Giancarlo Ferrari Trecate

TL;DR

PAC-PFL effectively mitigates overfitting even in data-poor scenarios and provides generalization bounds for new clients joining later by establishing and minimizing a PAC-Bayesian generalization bound on the average true loss of clients.

Abstract

Federated Learning (FL) aims to infer a shared model from private and decentralized data stored by multiple clients. Personalized FL (PFL) enhances the model's fit for each client by adapting the global model to the clients. A significant level of personalization is required for highly heterogeneous clients but can be challenging to achieve, especially when clients' datasets are small. To address this issue, we introduce the PAC-PFL framework for PFL of probabilistic models. PAC-PFL infers a shared hyper-posterior and treats each client's posterior inference as the personalization step. Unlike previous PFL algorithms, PAC-PFL does not regularize all personalized models towards a single shared model, thereby greatly enhancing its personalization flexibility. By establishing and minimizing a PAC-Bayesian generalization bound on the average true loss of clients, PAC-PFL effectively mitigates overfitting even in data-poor scenarios. Additionally, PAC-PFL provides generalization bounds for new clients joining later. PAC-PFL achieves accurate and well-calibrated predictions, as supported by our experiments.

Personalized Federated Learning of Probabilistic Models: A PAC-Bayesian Approach

TL;DR

PAC-PFL effectively mitigates overfitting even in data-poor scenarios and provides generalization bounds for new clients joining later by establishing and minimizing a PAC-Bayesian generalization bound on the average true loss of clients.

Abstract

Federated Learning (FL) aims to infer a shared model from private and decentralized data stored by multiple clients. Personalized FL (PFL) enhances the model's fit for each client by adapting the global model to the clients. A significant level of personalization is required for highly heterogeneous clients but can be challenging to achieve, especially when clients' datasets are small. To address this issue, we introduce the PAC-PFL framework for PFL of probabilistic models. PAC-PFL infers a shared hyper-posterior and treats each client's posterior inference as the personalization step. Unlike previous PFL algorithms, PAC-PFL does not regularize all personalized models towards a single shared model, thereby greatly enhancing its personalization flexibility. By establishing and minimizing a PAC-Bayesian generalization bound on the average true loss of clients, PAC-PFL effectively mitigates overfitting even in data-poor scenarios. Additionally, PAC-PFL provides generalization bounds for new clients joining later. PAC-PFL achieves accurate and well-calibrated predictions, as supported by our experiments.
Paper Structure (70 sections, 12 theorems, 64 equations, 6 figures, 6 tables, 3 algorithms)

This paper contains 70 sections, 12 theorems, 64 equations, 6 figures, 6 tables, 3 algorithms.

Key Result

Theorem 3.2

Fix a data-dependent prior $P$ obtained by an $\epsilon_i$-DP algorithm, a data distribution $\mathcal{D}_i$, and a bounded loss function $\ell(\cdot, \cdot)\in[a,b]$. For every $\beta>0$, confidence level $\delta \in (0,1]$, and posterior $Q_i=\mathds{Q}(P, \mathcal{S}_i\cup\Tilde{\mathcal{S}}_i)$, holds with probability at least $1-\delta$ over $\mathcal{S}_i\sim\mathcal{D}_i^{m_i}$ and $\Tilde{

Figures (6)

  • Figure 1: Illustration of the proposed PAC-PFL framework. For a given hyper-prior distribution $\mathcal{P}$, the server computes the optimal hyper-posterior $\mathcal{Q}^*$ as per Corollary \ref{['corol:opt_hypQ']} through communication with clients owning datasets $\mathcal{S}_1, \dots, \mathcal{S}_n$. During personalization, each client $i$ draws a prior distribution $P$ from $\mathcal{Q}^*$, combining it with its local dataset $\mathcal{S}_i$ and potentially new data $\Tilde{\mathcal{S}}_i$ to derive the optimal posterior distribution $Q^*_i$ according to Corollary \ref{['corol:qstar']}. The client then samples a model from $Q^*_i$ for making predictions. Note that the prior $P$ depends on each client's data, $\mathcal{S}_i$, conflicting with the Bayesian framework and necessitating proper consideration, as discussed in Section \ref{['sec:background']}.
  • Figure 2: Box plots of test RSMSE and CE for existing and new clients in the PV-EW ($150$) dataset. The line within each box is the median. PAC-PFL excels in CE median, CE spread, and RSMSE median over baselines. RSMSE spread is comparable to MTL and pFedGP. Pooled GP results are not plotted due to poor performance but are reported in Appendix \ref{['app:further_experiments']}.
  • Figure 3: Ablation study on the impact of the number of SVGD particles, $k$, on RSMSE of the existing clients in PV-EW ($150$) and PV-S ($150$) datasets. Each experiment is repeated over $5$ random seeds. The error bars correspond to the mean $\pm$ standard deviation. Computational cost scales linearly with $k$ (see Appendix \ref{['app:computational']}).
  • Figure 4: Performance evaluation of differentially private PAC-PFL (Algorithm \ref{['alg:DP_PACPFL']}) on the Polynomial dataset introduced in Appendix \ref{['app:datasets']}. The RSMSE metric is plotted against varying values for the differential privacy parameter, $\epsilon$. Lower $\epsilon$ values correspond to lower privacy levels. The RSMSE of non-private PAC-PFL (Algorithm \ref{['alg:PACPFL']}) is shown in red for reference. As expected, performance improves as the privacy level decreases due to lower noise injection.
  • Figure 5: Power output profile of $24$ houses in the PV-EW experiment over five days in June $2018$, where each line represents the PV generation of one house. Green and blue curves correspond to houses facing the east and the west respectively. Although the curves have noticeable differences, there are consistent trends present in the data.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Definition 3.1: DP
  • Theorem 3.2
  • Remark 3.3
  • Corollary 3.4: Catoni
  • Definition 4.1: Hyper-distributions
  • Theorem 4.2
  • Lemma 4.3
  • Corollary 4.4
  • Lemma 4.5
  • Lemma 4.6
  • ...and 9 more