Fed-ADE: Adaptive Learning Rate for Federated Post-adaptation under Distribution Shift

Heewon Park; Mugon Joe; Miru Kim; Kyungjin Im; Minhae Kwon

Fed-ADE: Adaptive Learning Rate for Federated Post-adaptation under Distribution Shift

Heewon Park, Mugon Joe, Miru Kim, Kyungjin Im, Minhae Kwon

TL;DR

Fed-ADE (Federated Adaptation with Distribution Shift Estimation with Distribution Shift Estimation) is proposed, an unsupervised federated adaptation framework that leverages lightweight estimators of distribution dynamics to enable effective and robust federated post-adaptation under real-world non-stationarity.

Abstract

Federated learning (FL) in post-deployment settings must adapt to non-stationary data streams across heterogeneous clients without access to ground-truth labels. A major challenge is learning rate selection under client-specific, time-varying distribution shifts, where fixed learning rates often lead to underfitting or divergence. We propose Fed-ADE (Federated Adaptation with Distribution Shift Estimation), an unsupervised federated adaptation framework that leverages lightweight estimators of distribution dynamics. Specifically, Fed-ADE employs uncertainty dynamics estimation to capture changes in predictive uncertainty and representation dynamics estimation to detect covariate-level feature drift, combining them into a per-client, per-timestep adaptive learning rate. We provide theoretical analyses showing that our dynamics estimation approximates the underlying distribution shift and yields dynamic regret and convergence guarantees. Experiments on image and text benchmarks under diverse distribution shifts (label and covariate) demonstrate consistent improvements over strong baselines. These results highlight that distribution shift-aware adaptation enables effective and robust federated post-adaptation under real-world non-stationarity.

Fed-ADE: Adaptive Learning Rate for Federated Post-adaptation under Distribution Shift

TL;DR

Abstract

Paper Structure (64 sections, 7 theorems, 67 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 64 sections, 7 theorems, 67 equations, 5 figures, 10 tables, 1 algorithm.

Introduction
Related Works
Federated Learning for Decentralized Data Environments
Distribution Shift Adaptation
Preliminaries
Pre-training ($t=0$)
Online Distribution Shift ($0 < t \le T$)
Learning Objective with Unsupervised Risk Estimation
Personalized Federated Learning with Layer Decoupling
Distribution Shift Driven Federated Post-adaptation
Distribution Shift Adaptive Learning Rate
Distribution Shift Dynamics Signal
Uncertainty Dynamics Estimation
Representation Dynamics Estimation
Theoretical Analyses
...and 49 more sections

Key Result

Theorem 1

Assume $\epsilon$-calibration with respect to $\mathbf Q_{c,\mathbf y}^{t}$, i.e., $\|\mathcal{H}(\{\psi_c,\phi_c\};\mathbf x_c^{t})-\mathbf y_c^{t}\|_2\le\epsilon_t$ in expectation. Then, the cumulative estimation error is bounded as where $K_{\cos}$ is the Lipschitz constant of the cosine similarity function.

Figures (5)

Figure 1: Overview of the Fed-ADE. Each client receives a pre-trained model, splits it into shared ($\psi_c$) and personalized ($\phi_c$) layers, and adapts to unlabeled, distribution-shifting data using an adaptive learning rate.
Figure 2: Impact of learning rate bounds on Fed-ADE under label shift scenarios on Tiny-ImageNet and CIFAR-10 datasets.
Figure 3: Visualization of how the parameter $\omega(t)$ changes over time for various shift types. The Lin. presents a steady and continuous increase or decrease, reflecting a gradual and predictable change. Both the Squ. and Sin. display periodic patterns. The Squ. alternates sharply between two values at regular intervals, while the Sin. oscillates smoothly in a wave-like manner. The Ber. introduces randomness at each timestep, resulting in a stochastic and less predictable trajectory for $\omega(t)$.
Figure 4: Visualization of heterogeneity among clients according to Dirichlet distribution factor $\alpha$ in CIFAR-10 dataset. The $x$-axis represents the ID of the client, while the $y$-axis represents the class of the data. The color of the circles varies with each class type, and the size of the circles indicates the size of the data. This level of heterogeneity is not dependent on the data set but on $\alpha$ values.
Figure 5: Impact of learning rate bounds on Fed-ADE under label shift scenarios.

Theorems & Definitions (15)

Theorem 1: Error Bound: Cumulative Uncertainty Dynamics
proof
Theorem 2: Error Bound: Cumulative Representation Dynamics
proof
Theorem 3: Dynamic Regret Bound
proof
Remark 4
Lemma 5: Unbiased Risk Estimator
proof
Remark 6
...and 5 more

Fed-ADE: Adaptive Learning Rate for Federated Post-adaptation under Distribution Shift

TL;DR

Abstract

Fed-ADE: Adaptive Learning Rate for Federated Post-adaptation under Distribution Shift

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (15)