Table of Contents
Fetching ...

Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

Enyi Jiang, Yibo Jacky Zhang, Sanmi Koyejo

TL;DR

This work tackles Federated Domain Adaptation under two core challenges: domain shift between source and target domains and limited target data. It develops a principled framework based on a Delta error decomposition using $d_π(\mathcal{D}_S,\mathcal{D}_T)$ and $\sigma_π^2(\widehat{\mathcal{D}}_T)$, guiding the design of aggregation rules. The authors introduce FedGP, a gradient projection-based FDA method, and FedDA, a convex-gradient-merge approach, along with an auto-weighting scheme that computes optimal per-source weights $β_i$ from target updates. Empirical results on semi-synthetic and real-world domain shifts show that FedGP and the auto-weighted variants consistently outperform personalized FL, UFDA, and domain-generalization baselines, demonstrating the practical value of principled gradient-based FDA aggregation.

Abstract

Federated Domain Adaptation (FDA) describes the federated learning (FL) setting where source clients and a server work collaboratively to improve the performance of a target client where limited data is available. The domain shift between the source and target domains, coupled with limited data of the target client, makes FDA a challenging problem, e.g., common techniques such as federated averaging and fine-tuning fail due to domain shift and data scarcity. To theoretically understand the problem, we introduce new metrics that characterize the FDA setting and a theoretical framework with novel theorems for analyzing the performance of server aggregation rules. Further, we propose a novel lightweight aggregation rule, Federated Gradient Projection ($\texttt{FedGP}$), which significantly improves the target performance with domain shift and data scarcity. Moreover, our theory suggests an $\textit{auto-weighting scheme}$ that finds the optimal combinations of the source and target gradients. This scheme improves both $\texttt{FedGP}$ and a simpler heuristic aggregation rule. Extensive experiments verify the theoretical insights and illustrate the effectiveness of the proposed methods in practice.

Principled Federated Domain Adaptation: Gradient Projection and Auto-Weighting

TL;DR

This work tackles Federated Domain Adaptation under two core challenges: domain shift between source and target domains and limited target data. It develops a principled framework based on a Delta error decomposition using and , guiding the design of aggregation rules. The authors introduce FedGP, a gradient projection-based FDA method, and FedDA, a convex-gradient-merge approach, along with an auto-weighting scheme that computes optimal per-source weights from target updates. Empirical results on semi-synthetic and real-world domain shifts show that FedGP and the auto-weighted variants consistently outperform personalized FL, UFDA, and domain-generalization baselines, demonstrating the practical value of principled gradient-based FDA aggregation.

Abstract

Federated Domain Adaptation (FDA) describes the federated learning (FL) setting where source clients and a server work collaboratively to improve the performance of a target client where limited data is available. The domain shift between the source and target domains, coupled with limited data of the target client, makes FDA a challenging problem, e.g., common techniques such as federated averaging and fine-tuning fail due to domain shift and data scarcity. To theoretically understand the problem, we introduce new metrics that characterize the FDA setting and a theoretical framework with novel theorems for analyzing the performance of server aggregation rules. Further, we propose a novel lightweight aggregation rule, Federated Gradient Projection (), which significantly improves the target performance with domain shift and data scarcity. Moreover, our theory suggests an that finds the optimal combinations of the source and target gradients. This scheme improves both and a simpler heuristic aggregation rule. Extensive experiments verify the theoretical insights and illustrate the effectiveness of the proposed methods in practice.
Paper Structure (58 sections, 9 theorems, 85 equations, 11 figures, 20 tables, 1 algorithm)

This paper contains 58 sections, 9 theorems, 85 equations, 11 figures, 20 tables, 1 algorithm.

Key Result

Theorem 3.3

For any probability measure $\pi$ over the parameter space, and an aggregation rule Aggr$(\cdot)$ with step size $\mu>0$. Given target domain sampled dataset $\widehat{{\mathcal{D}}}_T$, update the parameter for $T$ steps by $\theta^{t+1} := \theta^{t} - \mu \widehat{g}_{\texttt{Aggr}}(\theta^t).$ A where $C_\epsilon=\mathbb{E}_{\widehat{{\mathcal{D}}}_T} [ {1}/{\pi(B_\epsilon(\widehat{\theta}_{\t

Figures (11)

  • Figure 1: FedGP filters out the negative source gradients (colored in red) and convexly combines $g_T$ and its projections to to direction of the remaining source gradients (green ones).
  • Figure 2: The impact of changing domain shifts with noisy features or label shifts.
  • Figure 3: The effect of $\beta$ on FedDA and FedGP.
  • Figure 4: Ablation study on projection and filtering.
  • Figure 5: Given specific source-target domain distance and target domain variance: (a) shows which aggregation method has the smallest Delta error; (b)&(c) present which aggregation method actually achieves the best test result. In (a)&(b), FedDA and FedGP use a fixed $\beta=0.5$. In (c), FedDA and FedGP adopt the auto-weighted scheme. Observations: Comparing (a) and (b), we can see that the prediction from the Delta errors, computed at initialization, mostly track the actual test performance after training. Comparing (b) and (c), we can see that FedDA is greatly improved with the auto-weighted scheme. Moreover, we can see that FedGP with a fixed $\beta=0.5$ is good enough for most of the cases. These observations demonstrate the practical utility of our theoretical framework.
  • ...and 6 more figures

Theorems & Definitions (21)

  • Definition 2.1: Aggregation for Federated Domain Adaptation (FDA)
  • Definition 3.2: Delta Error of an aggregation rule Aggr$(\cdot)$
  • Theorem 3.3: Convergence and Generalization
  • Definition 3.4: $L^\pi$ Source-Target Domain Distance
  • Definition 3.5: $L^\pi$ Target Domain Variance
  • Theorem 3.6: $\Delta^2_{Aggr}$ Decomposition Theorem
  • Definition 3.7: An Error-Analysis Definition of FDA Aggregation
  • Definition 4.1: FedDA
  • Theorem 4.2
  • Definition 4.3: FedGP
  • ...and 11 more