Leveraging Function Space Aggregation for Federated Learning at Scale
Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite
TL;DR
This work addresses federated learning with heterogeneous client data and prolonged local updates by reframing aggregation in function space. It introduces FedFish, a Fisher-information diagonal–weighted, function-space aggregation method that computes a closed-form global update and relies on local gradient-based estimates to capture client importance without accessing client data. Empirically, FedFish outperforms FedAvg as the number of local epochs grows, improves post-personalization and transfer performance across image and language benchmarks, and introduces the Client-Server Barrier as a diagnostic for aggregation quality. The approach shows practical benefits for scalable FL, highlighting trade-offs in communication and potential extensions to higher-order Fisher estimates and privacy-preserving adaptations.
Abstract
The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
