Improving Federated Learning Personalization via Model Agnostic Meta Learning
Yihan Jiang, Jakub Konečný, Keith Rush, Sreeram Kannan
TL;DR
The paper reframes federated learning as a natural arena for meta-learning, showing FedAvg implicitly performs meta-learning updates and that personalization can be enhanced with a two-stage approach. It introduces Personalized FedAvg, combining FedAvg with a fine-tuning stage (Reptile(K) and Adam) to optimize both personalized performance and the quality of the initial model. Empirical results on EMNIST-62 and Shakespeare demonstrate improved personalized accuracy and stability, while also revealing that increasing local updates can boost personalization up to a limit. The work challenges the conventional FL objective of global accuracy alone and connects FL with MAML, suggesting new directions for optimization and evaluation in both fields.
Abstract
Federated Learning (FL) refers to learning a high quality global model based on decentralized data storage, without ever copying the raw data. A natural scenario arises with data created on mobile phones by the activity of their users. Given the typical data heterogeneity in such situations, it is natural to ask how can the global model be personalized for every such device, individually. In this work, we point out that the setting of Model Agnostic Meta Learning (MAML), where one optimizes for a fast, gradient-based, few-shot adaptation to a heterogeneous distribution of tasks, has a number of similarities with the objective of personalization for FL. We present FL as a natural source of practical applications for MAML algorithms, and make the following observations. 1) The popular FL algorithm, Federated Averaging, can be interpreted as a meta learning algorithm. 2) Careful fine-tuning can yield a global model with higher accuracy, which is at the same time easier to personalize. However, solely optimizing for the global model accuracy yields a weaker personalization result. 3) A model trained using a standard datacenter optimization method is much harder to personalize, compared to one trained using Federated Averaging, supporting the first claim. These results raise new questions for FL, MAML, and broader ML research.
