Table of Contents
Fetching ...

Salvaging Federated Learning by Local Adaptation

Tao Yu, Eugene Bagdasaryan, Vitaly Shmatikov

TL;DR

Federated learning aims to train on sensitive, non-iid data while preserving privacy, but privacy and robustness mechanisms can reduce per-user accuracy, weakening incentives to participate. The authors demonstrate that many participants gain less from the federated model than from local training, especially under differential privacy and Byzantine-robust aggregation. They propose purely local adaptation methods—fine-tuning, multi-task learning, and knowledge distillation—to tailor the federated model to individual data without altering global aggregation. Empirical results on next-word prediction and CIFAR-10 show that local adaptation recovers much of the lost accuracy and creates participation incentives, with tail users benefiting substantially. Overall, the work provides a practical path to salvage FL by enabling personalized, locally adaptable training.

Abstract

Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the fedeated model, latest FL approaches use differential privacy or robust aggregation. We look at FL from the \emph{local} viewpoint of an individual participant and ask: (1) do participants have an incentive to participate in FL? (2) how can participants \emph{individually} improve the quality of their local models, without re-designing the FL framework and/or involving other participants? First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model\textemdash and who have no incentive to participate in FL today\textemdash improve less, but sufficiently to make the adapted federated model better than their local models.

Salvaging Federated Learning by Local Adaptation

TL;DR

Federated learning aims to train on sensitive, non-iid data while preserving privacy, but privacy and robustness mechanisms can reduce per-user accuracy, weakening incentives to participate. The authors demonstrate that many participants gain less from the federated model than from local training, especially under differential privacy and Byzantine-robust aggregation. They propose purely local adaptation methods—fine-tuning, multi-task learning, and knowledge distillation—to tailor the federated model to individual data without altering global aggregation. Empirical results on next-word prediction and CIFAR-10 show that local adaptation recovers much of the lost accuracy and creates participation incentives, with tail users benefiting substantially. Overall, the work provides a practical path to salvage FL by enabling personalized, locally adaptable training.

Abstract

Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the fedeated model, latest FL approaches use differential privacy or robust aggregation. We look at FL from the \emph{local} viewpoint of an individual participant and ask: (1) do participants have an incentive to participate in FL? (2) how can participants \emph{individually} improve the quality of their local models, without re-designing the FL framework and/or involving other participants? First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model\textemdash and who have no incentive to participate in FL today\textemdash improve less, but sufficiently to make the adapted federated model better than their local models.

Paper Structure

This paper contains 16 sections, 6 equations, 7 figures.

Figures (7)

  • Figure 1: Accuracy improvements of federated models over local, trained-from-scratch models for word prediction (top row) and image classification (bottom row) tasks.
  • Figure 2: Accuracy improvements of adapted federated models over local, trained-from-scratch models for word prediction (top row) and image classification (bottom row) tasks.
  • Figure 3: Accuracy improvements of adapted over unadapted federated models vs. vocabulary size (top row) and total words (bottom row).
  • Figure 4: Accuracy improvements of adapted federated models over local (top row) and unadapted federated models (bottom row).
  • Figure 5: Cumulative accuracy improvements of different adaptations on BASIC-FED (left), DP-FED (middle), and ROBUST-FED (right).
  • ...and 2 more figures