Table of Contents
Fetching ...

Client-Conditional Federated Learning via Local Training Data Statistics

Rickard Brännvall

Abstract

Federated learning (FL) under data heterogeneity remains challenging: existing methods either ignore client differences (FedAvg), require costly cluster discovery (IFCA), or maintain per-client models (Ditto). All degrade when data is sparse or heterogeneity is multi-dimensional. We propose conditioning a single global model on locally-computed PCA statistics of each client's training data, requiring zero additional communication. Evaluating across 97~configurations spanning four heterogeneity types (label shift, covariate shift, concept shift, and combined heterogeneity), four datasets (MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100), and seven FL baseline methods, we find that our method matches the Oracle baseline -- which knows true cluster assignments -- across all settings, surpasses it by 1--6% on combined heterogeneity where continuous statistics are richer than discrete cluster identifiers, and is uniquely sparsity-robust among all tested methods.

Client-Conditional Federated Learning via Local Training Data Statistics

Abstract

Federated learning (FL) under data heterogeneity remains challenging: existing methods either ignore client differences (FedAvg), require costly cluster discovery (IFCA), or maintain per-client models (Ditto). All degrade when data is sparse or heterogeneity is multi-dimensional. We propose conditioning a single global model on locally-computed PCA statistics of each client's training data, requiring zero additional communication. Evaluating across 97~configurations spanning four heterogeneity types (label shift, covariate shift, concept shift, and combined heterogeneity), four datasets (MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100), and seven FL baseline methods, we find that our method matches the Oracle baseline -- which knows true cluster assignments -- across all settings, surpasses it by 1--6% on combined heterogeneity where continuous statistics are richer than discrete cluster identifiers, and is uniquely sparsity-robust among all tested methods.
Paper Structure (17 sections, 3 equations, 4 figures, 5 tables)

This paper contains 17 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Client-conditional pipeline for client $i$. Green boxes are client-local; orange boxes involve the federation. Prepare: PCA eigenvalues $\mathbf{s}_i$ are computed once from the training data. Train: model updates are computed locally and aggregated via federated learning to produce the shared model $\boldsymbol{\theta}$. Infer: the client uses the shared model $\boldsymbol{\theta}$ and its own $\mathbf{s}_i$ for predictions.
  • Figure 2: Four FL paradigms under data heterogeneity. (a) FedAvg: one global model, no personalization. (b) Clustered: $K$ separate models with iterative cluster discovery. (c) Personalized: $N$ per-client models fine-tuned from a shared model. (d) Ours: a single shared model conditioned on locally-computed statistics $\mathbf{s}_i$, requiring no cluster discovery and no additional communication.
  • Figure 3: Conditional exceeds Oracle on complex heterogeneity. Left: E3b label permutation on CIFAR-10 ($K$ sweep). Right: E4b combined heterogeneity (concept + covariate shift). Continuous statistics provide richer conditioning than discrete cluster IDs.
  • Figure 4: Sparsity robustness on E1 Label Shift ($K{=}2$). Conditional and Oracle maintain flat accuracy as data decreases 20-fold (from ${\sim}6{,}000$ to ${\sim}200$ samples/client). All other methods degrade, with Gossip collapsing to near-random.