Table of Contents
Fetching ...

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

Zhanbo Feng, Yuanjie Wang, Jie Li, Fan Yang, Jiong Lou, Tiebin Mi, Robert. C. Qiu, Zhenyu Liao

TL;DR

This work tackles the challenge of domain shift in federated learning by proposing RF-TCA, a random features-based acceleration of Transfer Component Analysis, and its federated extension FedRF-TCA for multi-source FDA. RF-TCA achieves near parity with vanilla TCA in transfer performance while dramatically reducing computation, and FedRF-TCA adds asynchronous, communication-efficient training with privacy benefits. The authors prove a performance guarantee for RF-TCA under a spectral-approximation regime and eigen-gap conditions, and demonstrate robust, sample-size-independent communication and resilience to client dropouts in extensive experiments. The approach offers scalable, privacy-conscious FDA for edge devices and heterogeneous data, with practical implications for real-world distributed adaptation tasks.

Abstract

Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is independent of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

TL;DR

This work tackles the challenge of domain shift in federated learning by proposing RF-TCA, a random features-based acceleration of Transfer Component Analysis, and its federated extension FedRF-TCA for multi-source FDA. RF-TCA achieves near parity with vanilla TCA in transfer performance while dramatically reducing computation, and FedRF-TCA adds asynchronous, communication-efficient training with privacy benefits. The authors prove a performance guarantee for RF-TCA under a spectral-approximation regime and eigen-gap conditions, and demonstrate robust, sample-size-independent communication and resilience to client dropouts in extensive experiments. The approach offers scalable, privacy-conscious FDA for edge devices and heterogeneous data, with practical implications for real-world distributed adaptation tasks.

Abstract

Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is independent of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.
Paper Structure (42 sections, 8 theorems, 37 equations, 7 figures, 18 tables, 5 algorithms)

This paper contains 42 sections, 8 theorems, 37 equations, 7 figures, 18 tables, 5 algorithms.

Key Result

Lemma 1

The matrix $\gamma {\mathbf{I}}_n + {\mathbf{K}} {\boldsymbol{\ell}} {\boldsymbol{\ell}}^ {\sf T} {\mathbf{K}}$ is invertible if and only if$\gamma + {\boldsymbol{\ell}}^ {\sf T} {\mathbf{K}}^2{\boldsymbol{\ell}} \neq 0$, and one has $(\gamma {\mathbf{I}}_n + {\mathbf{K}} {\boldsymbol{\ell}} {\bol

Figures (7)

  • Figure 1: Illustration of the proposed FedRF-TCA protocol composed of (i) Feature Extractor with both fixed and learnable weights, denoted ${\mathbf{G}}_S$ and ${\mathbf{G}}_T$, respectively, obtained by fine-tuning a pretrained model (like ResNet-50); (ii) RF-TCA Transfer Module using random features technique (see \ref{['def:RFF']} below) and a linear adaptive layer (${\mathbf{W}}_{\mathop{\mathrm{ {\rm RF} }}\nolimits_S}^{(i)}$ or ${\mathbf{W}}_{\mathop{\mathrm{ {\rm RF} }}\nolimits_T}$), with compressed features of the form $\boldsymbol{\Sigma} _i {\boldsymbol{\ell}}_i$ exchanged among clients during training; and (iii) Classifier. Solid arrows for local training and dashed arrows for global parameter aggregation between clients. See \ref{['sec:UFDA']} for a detailed discussion.
  • Figure 2: Top: Two-step "transformation" in TCA, from raw data space $\mathbb{R}^p$ to (possibly infinite-dimensional) RKHS $\mathcal{H}$, and then to the low-dimensional $\mathbb{R}^m$. Bottom: Two-step "transformation" in the proposed RF-TCA, from $\mathbb{R}^p$ to random features kernel space $\hat{\mathcal{H}} \subset \mathbb{R}^{2N}$, and then to the low-dimensional $\mathbb{R}^m$.
  • Figure 3: Classification accuracy and running time of the proposed RF-TCA versus baseline DA methods on DeCAF6 features of Office-Caltech and Office-31 datasets. Blue circles for RF-TCA approach with a different number of random features $N \in \{ 100, 500, 1\,000, 2\,000, 5\,000 \}$, the red, purple, green, brown and orange for TCApan2010domain, JDAlong2013transfer, CORALsun2016return, GFKgong2012geodesic, and DaNNghifary2014domain approach, respectively. The results are obtained by averaging over all source-target domain pairs ($12$ for Office-Caltech and $6$ for Office-31), see \ref{['app_sec:exp']} in the appendix for a detailed exposition of these results.
  • Figure 4: Classification accuracy (mean $\pm$ standard deviation) of FedRF-TCA and FedAvg with different communication intervals $T_{C} \in \{ 10, 20, 50, 100, 200, 400, 800 \}$, with in total $1\,600$ rounds of communication, as in (I) of \ref{['DigitFive_Ablation']}.
  • Figure 5: Performance of FedRF-TCA with and without $\boldsymbol{\Sigma} {\boldsymbol{\ell}}$, as well as of FedAvg in the case of explicit and implicit data heterogeneity. For explicit data heterogeneity, we use the same setting as in \ref{['tab:ablation_DigitFive']}; while for implicit data heterogeneity, we evenly divide the MNIST-M (or Synthetic Digits) of Digit-Five dataset into five subsets, so that each subset contains data from similar local data distribution.
  • ...and 2 more figures

Theorems & Definitions (17)

  • Lemma 1: Equivalent form of vanilla TCA
  • Definition 1: Unsupervised Federated Domain Adaptation, UFDA
  • Definition 2: Random Fourier features for Gaussian kernels, rahimi2008random
  • Theorem 1: Performance guarantee for RF-TCA
  • Remark 1: Eigen-gap condition
  • proof : Proof of \ref{['theo:perf_RF_TCA']}
  • Lemma 2: TCA versus R-TCA
  • Theorem 2: RFFs approximation of Gaussian kernels, tropp2015matrix
  • Remark 2: Privacy protection via random features
  • Lemma 3: Sherman–Morrison
  • ...and 7 more