Personalized Federated Learning: A Meta-Learning Approach
Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar
TL;DR
The paper tackles data heterogeneity in Federated Learning by formulating a personalized FL framework grounded in Model-Agnostic Meta-Learning (MAML). It introduces Per-FedAvg, a FedAvg-inspired algorithm that learns a shared initialization enabling rapid user-specific adaptation via one or a few local gradient steps, and provides convergence guarantees for nonconvex objectives. It further analyzes how task similarity, quantified via distribution distances such as TV and 1-Wasserstein, influences convergence, and offers first-order and Hessian-assisted variants with empirical validation on MNIST and CIFAR-10. Overall, the work advances personalized FL by coupling meta-learning principles with FedAvg, delivering both theoretical insight and practical algorithms for heterogenous client populations.
Abstract
In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common output for all the users, and, therefore, it does not adapt the model to each user. This is an important missing feature, especially given the heterogeneity of the underlying data distribution for various users. In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data. This approach keeps all the benefits of the federated learning architecture, and, by structure, leads to a more personalized model for each user. We show this problem can be studied within the Model-Agnostic Meta-Learning (MAML) framework. Inspired by this connection, we study a personalized variant of the well-known Federated Averaging algorithm and evaluate its performance in terms of gradient norm for non-convex loss functions. Further, we characterize how this performance is affected by the closeness of underlying distributions of user data, measured in terms of distribution distances such as Total Variation and 1-Wasserstein metric.
