Table of Contents
Fetching ...

Adaptive Personalized Federated Learning

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

TL;DR

The paper tackles the challenge of statistical heterogeneity in federated learning by shifting from sole global optimization to adaptive personalization. It introduces APFL, where each client learns a personalized model that blends a local and a global predictor with a per-client mixing parameter that is updated adaptively. The authors provide a generalization bound for the mixed model, establish an optimal mixing parameter, and develop a communication-efficient optimization method with convergence guarantees in both strongly convex and nonconvex settings. Empirical results on MNIST, CIFAR-10, EMNIST, and synthetic data demonstrate that APFL consistently improves local generalization and can outperform existing personalization baselines, especially under non-IID data distributions. This work offers a principled framework for balancing cross-client collaboration with client-specific performance in real-world federated settings.

Abstract

Investigation of the degree of personalization in federated learning algorithms has shown that only maximizing the performance of the global model will confine the capacity of the local models to personalize. In this paper, we advocate an adaptive personalized federated learning (APFL) algorithm, where each client will train their local models while contributing to the global model. We derive the generalization bound of mixture of local and global models, and find the optimal mixing parameter. We also propose a communication-efficient optimization method to collaboratively learn the personalized models and analyze its convergence in both smooth strongly convex and nonconvex settings. The extensive experiments demonstrate the effectiveness of our personalization schema, as well as the correctness of established generalization theories.

Adaptive Personalized Federated Learning

TL;DR

The paper tackles the challenge of statistical heterogeneity in federated learning by shifting from sole global optimization to adaptive personalization. It introduces APFL, where each client learns a personalized model that blends a local and a global predictor with a per-client mixing parameter that is updated adaptively. The authors provide a generalization bound for the mixed model, establish an optimal mixing parameter, and develop a communication-efficient optimization method with convergence guarantees in both strongly convex and nonconvex settings. Empirical results on MNIST, CIFAR-10, EMNIST, and synthetic data demonstrate that APFL consistently improves local generalization and can outperform existing personalization baselines, especially under non-IID data distributions. This work offers a principled framework for balancing cross-client collaboration with client-specific performance in real-world federated settings.

Abstract

Investigation of the degree of personalization in federated learning algorithms has shown that only maximizing the performance of the global model will confine the capacity of the local models to personalize. In this paper, we advocate an adaptive personalized federated learning (APFL) algorithm, where each client will train their local models while contributing to the global model. We derive the generalization bound of mixture of local and global models, and find the optimal mixing parameter. We also propose a communication-efficient optimization method to collaboratively learn the personalized models and analyze its convergence in both smooth strongly convex and nonconvex settings. The extensive experiments demonstrate the effectiveness of our personalization schema, as well as the correctness of established generalization theories.

Paper Structure

This paper contains 57 sections, 21 theorems, 125 equations, 7 figures, 2 tables.

Key Result

Theorem 1

Let $\mathcal{H}$ be a hypothesis class with finite VC dimension $d$. Assume loss function $\ell$ is Lipschitz continuous with constant $G$, and bounded in $[0, B]$. Then with probability at least $1-\delta$, there exists a constant $C$, such that the risk of the mixed model $h_{\alpha_i} = \alpha_i where $m_i, i=1, 2, \ldots, n$ is the number of training data at $i$th user, $m = m_1 + \ldots + m_

Figures (7)

  • Figure 1: Comparing the generalization and training losses of our proposed personalized model with the global models of FedAvg mcmahan2017communication and SCAFFOLD karimireddy2019scaffold by increasing the diversity among the data of clients on MNIST dataset with a logistic regression model. Increasing the diversity among local data can lead to a poor generalization performance of global models of FedAvg and SCAFFOLD on local data, while it is diminishing for the proposed personalized model.
  • Figure 2: Comparing the performance of the proposed APFL algorithm with FedAvg mcmahan2017communication (APFL with $\alpha=0$) and SCAFFOLD karimireddy2019scaffold on the MNIST dataset with different levels of non-IID data distribution among different clients using a logistic regression model. The first row shows the training loss for global models, as well as local and personalized models, averaged over all clients. The second row shows the generalization of the same models on their validation data. In (a), the second row, SCAFFOLD lines and global FedAvg line are removed since they represent low values, which degrade the readability of the plot.
  • Figure 3: Evaluating the effect of sampling on APFL and FedAvg algorithm using the MNIST dataset that is non-IID with only $2$ classes per client with logistic regression as the loss. The first row is training performance on the local model of FedAvg and personalized model of APFL with different sampling rates from $\{0.3,0.5,0.7\}$. The second row is the generalization performance of models on local validation data, aggregated over all clients. It can be inferred that despite the sampling ratio, APFL can superbly outperform FedAvg.
  • Figure 4: Comparing the personalized model of APFL with adaptive $\alpha$ and the local model in FedAvg. The first figure is the training performance, where APFL outperforms FedAvg when comparing the same dataset. The second figure shows the generalization of these methods on local validation data. APFL superbly outperforms FedAvg in generalization performance and adaptively updating $\alpha$ results in the same performance for datasets with different levels of diversity.
  • Figure 5: The results of applying FedAvg and APFL (with adaptive $\alpha$) on an MLP model using EMNIST dataset, which is naturally heterogeneous. APFL achieves the same training loss of localized FedAVG, while outperforms it in validation accuracy.
  • ...and 2 more figures

Theorems & Definitions (55)

  • Definition 1
  • Remark 1
  • Theorem 1
  • proof
  • Remark 2
  • Remark 3
  • Definition 2: Gradient Diversity
  • Definition 3: Local-Global Optimality Gap
  • Theorem 2: Global model convergence of Local Descent APFL
  • proof
  • ...and 45 more