Federated Learning with Profile Mapping under Distribution Shifts and Drifts

Mohan Li; Dario Fenoglio; Martin Gjoreski; Marc Langheinrich

Federated Learning with Profile Mapping under Distribution Shifts and Drifts

Mohan Li, Dario Fenoglio, Martin Gjoreski, Marc Langheinrich

TL;DR

Feroma tackles federated learning under real-world non-IID conditions by introducing distribution-profile–based aggregation that handles both cross-client distribution shifts and intra-client drifts without relying on client identities or pre-specified clusters. It constructs compact, privacy-preserving distribution profiles via a Distribution–Profile Extractor, then maps current profiles to past ones to generate adaptive aggregation weights, enabling training-time adaptation and one-shot, test-time model assignment for unseen clients. The framework automatically selects aggregation strategies (clustered, personalized, or global) per round and maintains test-time robustness by nearest-profile matching to final-round models, all with minimal communication and computation overhead. Empirical results across six datasets show Feroma achieving up to around 12–14 percentage points accuracy gains over SOTA baselines while keeping overhead comparable to FedAvg, demonstrating strong robustness to both distribution shift and drift in practical FL deployments.

Abstract

Federated Learning (FL) enables decentralized model training across clients without sharing raw data, but its performance degrades under real-world data heterogeneity. Existing methods often fail to address distribution shift across clients and distribution drift over time, or they rely on unrealistic assumptions such as known number of client clusters and data heterogeneity types, which limits their generalizability. We introduce Feroma, a novel FL framework that explicitly handles both distribution shift and drift without relying on client or cluster identity. Feroma builds on client distribution profiles-compact, privacy-preserving representations of local data-that guide model aggregation and test-time model assignment through adaptive similarity-based weighting. This design allows Feroma to dynamically select aggregation strategies during training, ranging from clustered to personalized, and deploy suitable models to unseen, and unlabeled test clients without retraining, online adaptation, or prior knowledge on clients' data. Extensive experiments show that compared to 10 state-of-the-art methods, Feroma improves performance and stability under dynamic data heterogeneity conditions-an average accuracy gain of up to 12 percentage points over the best baselines across 6 benchmarks-while maintaining computational and communication overhead comparable to FedAvg. These results highlight that distribution-profile-based aggregation offers a practical path toward robust FL under both data distribution shifts and drifts.

Federated Learning with Profile Mapping under Distribution Shifts and Drifts

TL;DR

Abstract

Paper Structure (73 sections, 1 theorem, 21 equations, 24 figures, 50 tables, 2 algorithms)

This paper contains 73 sections, 1 theorem, 21 equations, 24 figures, 50 tables, 2 algorithms.

Introduction
Background
FL under IID assumption.
FL under distribution shifts.
FL under distribution drifts.
Feroma
Problem definition.
Distribution profile extraction
Distribution profile mapping
Training distribution mapping
Automatic aggregation strategy selection.
Testing distribution mapping
Experiments
Experiment settings
Results
...and 58 more sections

Key Result

Proposition 1

Define the profile distance for two clients as If (A1)–(A3) hold, then for constants we have the two-sided bound Consequently $W_2$ and $\Delta$ are bi-Lipschitz equivalent We use bi-Lipschitz equivalence in the standard metric-geometry sense: two distances $d_1,d_2$ are bi-Lipschitz equivalent on a set $\mathcal{B}$ if there exist constants $c_-,c_+>0$ such that for all $a,b\in\mathcal{B}$, $c

Figures (24)

Figure 1: Comparison between FedAvg and Feroma under (Left) distribution shift across clients, and (Right) under distribution drift every 2 rounds.
Figure 2: Distribution shifts and drifts in FL. Colors indicate distinct local data distributions. Changes across clients reflect distribution shifts; changes over rounds reflect distribution drifts.
Figure 3: Feroma pipeline. In each round $t$, clients extract distribution profiles (DPE), map them to previous‐round profiles (DPM), and compute weighted aggregation (WA) for local training (LT).
Figure 4: Performance comparison across varying numbers of clients. Left: Mean accuracy and standard deviation. Right: Training time per 20 rounds.
Figure 5: Seen-once distribution in training stage.
...and 19 more figures

Theorems & Definitions (3)

Definition 3.1: Distribution--Profile Extractor
Proposition 1: Bi-Lipschitz equivalence to $W_2$ for marginals
proof

Federated Learning with Profile Mapping under Distribution Shifts and Drifts

TL;DR

Abstract

Federated Learning with Profile Mapping under Distribution Shifts and Drifts

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (3)