Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

Enyi Jiang; Yibo Jacky Zhang; Sanmi Koyejo

Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

Enyi Jiang, Yibo Jacky Zhang, Sanmi Koyejo

TL;DR

This work tackles Federated Domain Adaptation under two core challenges: domain shift between source and target domains and limited target data. It develops a principled framework based on a Delta error decomposition using $d_π(\mathcal{D}_S,\mathcal{D}_T)$ and $\sigma_π^2(\widehat{\mathcal{D}}_T)$, guiding the design of aggregation rules. The authors introduce FedGP, a gradient projection-based FDA method, and FedDA, a convex-gradient-merge approach, along with an auto-weighting scheme that computes optimal per-source weights $β_i$ from target updates. Empirical results on semi-synthetic and real-world domain shifts show that FedGP and the auto-weighted variants consistently outperform personalized FL, UFDA, and domain-generalization baselines, demonstrating the practical value of principled gradient-based FDA aggregation.

Abstract

Federated Domain Adaptation (FDA) describes the federated learning (FL) setting where source clients and a server work collaboratively to improve the performance of a target client where limited data is available. The domain shift between the source and target domains, coupled with limited data of the target client, makes FDA a challenging problem, e.g., common techniques such as federated averaging and fine-tuning fail due to domain shift and data scarcity. To theoretically understand the problem, we introduce new metrics that characterize the FDA setting and a theoretical framework with novel theorems for analyzing the performance of server aggregation rules. Further, we propose a novel lightweight aggregation rule, Federated Gradient Projection ($\texttt{FedGP}$), which significantly improves the target performance with domain shift and data scarcity. Moreover, our theory suggests an $\textit{auto-weighting scheme}$ that finds the optimal combinations of the source and target gradients. This scheme improves both $\texttt{FedGP}$ and a simpler heuristic aggregation rule. Extensive experiments verify the theoretical insights and illustrate the effectiveness of the proposed methods in practice.

Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

TL;DR

and

, guiding the design of aggregation rules. The authors introduce FedGP, a gradient projection-based FDA method, and FedDA, a convex-gradient-merge approach, along with an auto-weighting scheme that computes optimal per-source weights

from target updates. Empirical results on semi-synthetic and real-world domain shifts show that FedGP and the auto-weighted variants consistently outperform personalized FL, UFDA, and domain-generalization baselines, demonstrating the practical value of principled gradient-based FDA aggregation.

Abstract

), which significantly improves the target performance with domain shift and data scarcity. Moreover, our theory suggests an

that finds the optimal combinations of the source and target gradients. This scheme improves both

and a simpler heuristic aggregation rule. Extensive experiments verify the theoretical insights and illustrate the effectiveness of the proposed methods in practice.

Paper Structure (58 sections, 9 theorems, 85 equations, 11 figures, 20 tables, 1 algorithm)

This paper contains 58 sections, 9 theorems, 85 equations, 11 figures, 20 tables, 1 algorithm.

Introduction
The Problem of Federated Domain Adaptation
Related Work
A Theoretical Framework for Analyzing Aggregation Rules for FDA
Methods: Gradient Projection and the Auto-weighting Scheme
The Aggregation Rules: FedDA and FedGP
The Auto-weighting FedGP and FedDA
Experiments
Semi-synthetic Dataset Experiments with Varing Shifts
Datasets, models, and methods.
Auto-weighted methods and FedGP keep a better trade-off between bias and variance.
Real Dataset Experiments with Real-world Shifts
Ablation Study and Discussion
Conclusion
Supplementary Theoretical Results
...and 43 more sections

Key Result

Theorem 3.3

For any probability measure $\pi$ over the parameter space, and an aggregation rule Aggr$(\cdot)$ with step size $\mu>0$. Given target domain sampled dataset $\widehat{{\mathcal{D}}}_T$, update the parameter for $T$ steps by $\theta^{t+1} := \theta^{t} - \mu \widehat{g}_{\texttt{Aggr}}(\theta^t).$ A where $C_\epsilon=\mathbb{E}_{\widehat{{\mathcal{D}}}_T} [ {1}/{\pi(B_\epsilon(\widehat{\theta}_{\t

Figures (11)

Figure 1: FedGP filters out the negative source gradients (colored in red) and convexly combines $g_T$ and its projections to to direction of the remaining source gradients (green ones).
Figure 2: The impact of changing domain shifts with noisy features or label shifts.
Figure 3: The effect of $\beta$ on FedDA and FedGP.
Figure 4: Ablation study on projection and filtering.
Figure 5: Given specific source-target domain distance and target domain variance: (a) shows which aggregation method has the smallest Delta error; (b)&(c) present which aggregation method actually achieves the best test result. In (a)&(b), FedDA and FedGP use a fixed $\beta=0.5$. In (c), FedDA and FedGP adopt the auto-weighted scheme. Observations: Comparing (a) and (b), we can see that the prediction from the Delta errors, computed at initialization, mostly track the actual test performance after training. Comparing (b) and (c), we can see that FedDA is greatly improved with the auto-weighted scheme. Moreover, we can see that FedGP with a fixed $\beta=0.5$ is good enough for most of the cases. These observations demonstrate the practical utility of our theoretical framework.
...and 6 more figures

Theorems & Definitions (21)

Definition 2.1: Aggregation for Federated Domain Adaptation (FDA)
Definition 3.2: Delta Error of an aggregation rule Aggr$(\cdot)$
Theorem 3.3: Convergence and Generalization
Definition 3.4: $L^\pi$ Source-Target Domain Distance
Definition 3.5: $L^\pi$ Target Domain Variance
Theorem 3.6: $\Delta^2_{Aggr}$ Decomposition Theorem
Definition 3.7: An Error-Analysis Definition of FDA Aggregation
Definition 4.1: FedDA
Theorem 4.2
Definition 4.3: FedGP
...and 11 more

Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

TL;DR

Abstract

Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (21)