Table of Contents
Fetching ...

Leveraging Function Space Aggregation for Federated Learning at Scale

Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite

TL;DR

This work addresses federated learning with heterogeneous client data and prolonged local updates by reframing aggregation in function space. It introduces FedFish, a Fisher-information diagonal–weighted, function-space aggregation method that computes a closed-form global update and relies on local gradient-based estimates to capture client importance without accessing client data. Empirically, FedFish outperforms FedAvg as the number of local epochs grows, improves post-personalization and transfer performance across image and language benchmarks, and introduces the Client-Server Barrier as a diagnostic for aggregation quality. The approach shows practical benefits for scalable FL, highlighting trade-offs in communication and potential extensions to higher-order Fisher estimates and privacy-preserving adaptations.

Abstract

The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.

Leveraging Function Space Aggregation for Federated Learning at Scale

TL;DR

This work addresses federated learning with heterogeneous client data and prolonged local updates by reframing aggregation in function space. It introduces FedFish, a Fisher-information diagonal–weighted, function-space aggregation method that computes a closed-form global update and relies on local gradient-based estimates to capture client importance without accessing client data. Empirically, FedFish outperforms FedAvg as the number of local epochs grows, improves post-personalization and transfer performance across image and language benchmarks, and introduces the Client-Server Barrier as a diagnostic for aggregation quality. The approach shows practical benefits for scalable FL, highlighting trade-offs in communication and potential extensions to higher-order Fisher estimates and privacy-preserving adaptations.

Abstract

The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
Paper Structure (41 sections, 10 equations, 10 figures, 6 tables, 2 algorithms)

This paper contains 41 sections, 10 equations, 10 figures, 6 tables, 2 algorithms.

Figures (10)

  • Figure 1: Given two functions modeled over disjoint supports (left), a direct parameter average fails to represent either function well (center), while function space aggregation aims to preserve both functional relationships (right).
  • Figure 2: As heterogeneity across clients increases (top left$\rightarrow$top right$\rightarrow$bottom left), FedAvg deteriorates, while FedFish matches predictions of both client models. For each setting shown and each client within it, the FedFish global model has lower barrier to the clients (bottom right).
  • Figure 3: Training on EMNIST with FedFish converges faster and to a higher global accuracy (left) and post-personalization accuracy (right) than training with FedAvg, across varying numbers of local epochs. Results are shown with fixed compute across configurations: each training iteration corresponds to a local epoch, and each marker indicates 100 federated communication rounds.
  • Figure 4: Global and post-personalization performance in terms of classification accuracy on CIFAR100 (left) and next-token prediction accuracy on Stack Overflow (right). Varying number of local training epochs can significantly impact FedAvg performance while FedFish remains relatively robust to this.
  • Figure 5: Transfer (global and post-personalization) performance in terms of next-token prediction after federated pretraining on C4 and evaluating on C4 (left), Stack Overflow (center) and CC-News (right). Personalizing with 0%, 25% or 50% of held-out client data results in FedFish outperforming FedAvg, especially with longer local training.
  • ...and 5 more figures