Federated Learning with Profile Mapping under Distribution Shifts and Drifts
Mohan Li, Dario Fenoglio, Martin Gjoreski, Marc Langheinrich
TL;DR
Feroma tackles federated learning under real-world non-IID conditions by introducing distribution-profile–based aggregation that handles both cross-client distribution shifts and intra-client drifts without relying on client identities or pre-specified clusters. It constructs compact, privacy-preserving distribution profiles via a Distribution–Profile Extractor, then maps current profiles to past ones to generate adaptive aggregation weights, enabling training-time adaptation and one-shot, test-time model assignment for unseen clients. The framework automatically selects aggregation strategies (clustered, personalized, or global) per round and maintains test-time robustness by nearest-profile matching to final-round models, all with minimal communication and computation overhead. Empirical results across six datasets show Feroma achieving up to around 12–14 percentage points accuracy gains over SOTA baselines while keeping overhead comparable to FedAvg, demonstrating strong robustness to both distribution shift and drift in practical FL deployments.
Abstract
Federated Learning (FL) enables decentralized model training across clients without sharing raw data, but its performance degrades under real-world data heterogeneity. Existing methods often fail to address distribution shift across clients and distribution drift over time, or they rely on unrealistic assumptions such as known number of client clusters and data heterogeneity types, which limits their generalizability. We introduce Feroma, a novel FL framework that explicitly handles both distribution shift and drift without relying on client or cluster identity. Feroma builds on client distribution profiles-compact, privacy-preserving representations of local data-that guide model aggregation and test-time model assignment through adaptive similarity-based weighting. This design allows Feroma to dynamically select aggregation strategies during training, ranging from clustered to personalized, and deploy suitable models to unseen, and unlabeled test clients without retraining, online adaptation, or prior knowledge on clients' data. Extensive experiments show that compared to 10 state-of-the-art methods, Feroma improves performance and stability under dynamic data heterogeneity conditions-an average accuracy gain of up to 12 percentage points over the best baselines across 6 benchmarks-while maintaining computational and communication overhead comparable to FedAvg. These results highlight that distribution-profile-based aggregation offers a practical path toward robust FL under both data distribution shifts and drifts.
