Salvaging Federated Learning by Local Adaptation
Tao Yu, Eugene Bagdasaryan, Vitaly Shmatikov
TL;DR
Federated learning aims to train on sensitive, non-iid data while preserving privacy, but privacy and robustness mechanisms can reduce per-user accuracy, weakening incentives to participate. The authors demonstrate that many participants gain less from the federated model than from local training, especially under differential privacy and Byzantine-robust aggregation. They propose purely local adaptation methods—fine-tuning, multi-task learning, and knowledge distillation—to tailor the federated model to individual data without altering global aggregation. Empirical results on next-word prediction and CIFAR-10 show that local adaptation recovers much of the lost accuracy and creates participation incentives, with tail users benefiting substantially. Overall, the work provides a practical path to salvage FL by enabling personalized, locally adaptable training.
Abstract
Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the fedeated model, latest FL approaches use differential privacy or robust aggregation. We look at FL from the \emph{local} viewpoint of an individual participant and ask: (1) do participants have an incentive to participate in FL? (2) how can participants \emph{individually} improve the quality of their local models, without re-designing the FL framework and/or involving other participants? First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model\textemdash and who have no incentive to participate in FL today\textemdash improve less, but sufficiently to make the adapted federated model better than their local models.
