Federated Evaluation of On-device Personalization
Kangkang Wang, Rajiv Mathews, Chloé Kiddon, Hubert Eichner, Françoise Beaufays, Daniel Ramage
TL;DR
Federated learning enables training global models without sending raw data to central servers, but personalization introduces risks of degrading some users' experiences. This work extends FL with Federated Personalization Evaluation (FPE), a privacy-preserving framework that splits local data into train and test, computes baseline and personalized metrics, and aggregates delta metrics to guide deployment decisions. In experiments on a next-word prediction model, personalization yielded meaningful gains across a large user base, with a mean relative improvement around 14.5% and substantial per-user gains under certain hyperparameters (e.g., about 47% of users achieving at least 0.02 improvement). The approach provides scalable guidance for hyperparameter tuning and gating in live inference, enabling privacy-preserving, large-scale evaluation of personalization strategies.
Abstract
Federated learning is a distributed, on-device computation framework that enables training global models without exporting sensitive user data to servers. In this work, we describe methods to extend the federation framework to evaluate strategies for personalization of global models. We present tools to analyze the effects of personalization and evaluate conditions under which personalization yields desirable models. We report on our experiments personalizing a language model for a virtual keyboard for smartphones with a population of tens of millions of users. We show that a significant fraction of users benefit from personalization.
