Table of Contents
Fetching ...

Adaptive Personalized Federated Learning via Multi-task Averaging of Kernel Mean Embeddings

Jean-Baptiste Fermanian, Batiste Le Bars, Aurélien Bellet

TL;DR

A new PFL approach in which each agent optimizes a weighted combination of all agents'empirical risks, with the weights learned from data rather than specified a priori, yields a fully adaptive procedure that requires no prior knowledge of data heterogeneity and can automatically transition between global and local learning regimes.

Abstract

Personalized Federated Learning (PFL) enables a collection of agents to collaboratively learn individual models without sharing raw data. We propose a new PFL approach in which each agent optimizes a weighted combination of all agents' empirical risks, with the weights learned from data rather than specified a priori. The novelty of our method lies in formulating the estimation of these collaborative weights as a kernel mean embedding estimation problem with multiple data sources, leveraging tools from multi-task averaging to capture statistical relationships between agents. This perspective yields a fully adaptive procedure that requires no prior knowledge of data heterogeneity and can automatically transition between global and local learning regimes. By recasting the objective as a high-dimensional mean estimation problem, we derive finite-sample guarantees on local excess risks for a broad class of distributions, explicitly quantifying the statistical gains of collaboration. To address communication constraints inherent to federated settings, we also propose a practical implementation based on random Fourier features, which allows one to trade communication cost for statistical efficiency. Numerical experiments validate our theoretical results.

Adaptive Personalized Federated Learning via Multi-task Averaging of Kernel Mean Embeddings

TL;DR

A new PFL approach in which each agent optimizes a weighted combination of all agents'empirical risks, with the weights learned from data rather than specified a priori, yields a fully adaptive procedure that requires no prior knowledge of data heterogeneity and can automatically transition between global and local learning regimes.

Abstract

Personalized Federated Learning (PFL) enables a collection of agents to collaboratively learn individual models without sharing raw data. We propose a new PFL approach in which each agent optimizes a weighted combination of all agents' empirical risks, with the weights learned from data rather than specified a priori. The novelty of our method lies in formulating the estimation of these collaborative weights as a kernel mean embedding estimation problem with multiple data sources, leveraging tools from multi-task averaging to capture statistical relationships between agents. This perspective yields a fully adaptive procedure that requires no prior knowledge of data heterogeneity and can automatically transition between global and local learning regimes. By recasting the objective as a high-dimensional mean estimation problem, we derive finite-sample guarantees on local excess risks for a broad class of distributions, explicitly quantifying the statistical gains of collaboration. To address communication constraints inherent to federated settings, we also propose a practical implementation based on random Fourier features, which allows one to trade communication cost for statistical efficiency. Numerical experiments validate our theoretical results.
Paper Structure (37 sections, 14 theorems, 98 equations, 4 figures, 2 tables, 3 algorithms)

This paper contains 37 sections, 14 theorems, 98 equations, 4 figures, 2 tables, 3 algorithms.

Key Result

Lemma 4.3

Under Assumption ass:lossinrkhs, for any learned weights $\widehat{\omega}$, we have: where $R_\Theta = \sup_{\theta} \IfEqCase{a}{ {a}{\mathopen{}\mathclose{\left\lVert h_\theta\right\rVert}} {0}{\lVert h_\theta\rVert} {1}{\lVert h_\theta\rVert} {2}{\lVert h_\theta\rVert} {3}{\lVert h_\theta\rVert} {4}{\lVert h_\theta\rVert} }[] _{\mathcal{H}}$. Moreover, if for some $r>0$, $\IfEqCa where $\Sigm

Figures (4)

  • Figure 1: Mean Squared Error and its standard deviation of different approaches in function of the intra-group noise $\sigma^2_c$.
  • Figure 2: Synthetic concept shift. Left side: test MSE in function of the architecture (lower is better). Right size: learned weights.
  • Figure 3: FEMNIST. Accuracy of each agent for each method sorted in function of the Q-aggregation ones and a boxplot of these accuracies other the agents (higher is better)
  • Figure 4: Number of train and test points for each agent.

Theorems & Definitions (28)

  • Remark 2.1: Optimization error
  • Example 4.2: Linear regression
  • Lemma 4.3
  • Theorem 4.4
  • Example 4.5: Identical agents
  • Corollary 4.6
  • Example 5.1: Linear regression, continued
  • Theorem 5.2
  • Proposition C.2
  • Lemma D.1
  • ...and 18 more