Robust and Communication-Efficient Federated Domain Adaptation via Random Features

Zhanbo Feng; Yuanjie Wang; Jie Li; Fan Yang; Jiong Lou; Tiebin Mi; Robert. C. Qiu; Zhenyu Liao

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

Zhanbo Feng, Yuanjie Wang, Jie Li, Fan Yang, Jiong Lou, Tiebin Mi, Robert. C. Qiu, Zhenyu Liao

TL;DR

This work tackles the challenge of domain shift in federated learning by proposing RF-TCA, a random features-based acceleration of Transfer Component Analysis, and its federated extension FedRF-TCA for multi-source FDA. RF-TCA achieves near parity with vanilla TCA in transfer performance while dramatically reducing computation, and FedRF-TCA adds asynchronous, communication-efficient training with privacy benefits. The authors prove a performance guarantee for RF-TCA under a spectral-approximation regime and eigen-gap conditions, and demonstrate robust, sample-size-independent communication and resilience to client dropouts in extensive experiments. The approach offers scalable, privacy-conscious FDA for edge devices and heterogeneous data, with practical implications for real-world distributed adaptation tasks.

Abstract

Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is independent of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

TL;DR

Abstract

Paper Structure (42 sections, 8 theorems, 37 equations, 7 figures, 18 tables, 5 algorithms)

This paper contains 42 sections, 8 theorems, 37 equations, 7 figures, 18 tables, 5 algorithms.

Introduction
Our Approach and Contribution
Notations and Organization of the Paper
Previous Efforts and Preliminaries
Review of Previous Efforts
Random features and kernel method
Transfer learning and domain adaptation
Federated learning and federated domain adaptation
Maximum Mean Discrepancy Principle and Transfer Component Analysis
Federated Averaging
Federated Domain Adaptation
A Random Features Approach to TCA
RF-TCA: A Random Features Approach to Efficient TCA
Performance Guarantee of RF-TCA
Robust and Communication-efficient Federated Domain Adaptation via RF-TCA
...and 27 more sections

Key Result

Lemma 1

The matrix $\gamma {\mathbf{I}}_n + {\mathbf{K}} {\boldsymbol{\ell}} {\boldsymbol{\ell}}^ {\sf T} {\mathbf{K}}$ is invertible if and only if$\gamma + {\boldsymbol{\ell}}^ {\sf T} {\mathbf{K}}^2{\boldsymbol{\ell}} \neq 0$, and one has $(\gamma {\mathbf{I}}_n + {\mathbf{K}} {\boldsymbol{\ell}} {\bol

Figures (7)

Figure 1: Illustration of the proposed FedRF-TCA protocol composed of (i) Feature Extractor with both fixed and learnable weights, denoted ${\mathbf{G}}_S$ and ${\mathbf{G}}_T$, respectively, obtained by fine-tuning a pretrained model (like ResNet-50); (ii) RF-TCA Transfer Module using random features technique (see \ref{['def:RFF']} below) and a linear adaptive layer (${\mathbf{W}}_{\mathop{\mathrm{ {\rm RF} }}\nolimits_S}^{(i)}$ or ${\mathbf{W}}_{\mathop{\mathrm{ {\rm RF} }}\nolimits_T}$), with compressed features of the form $\boldsymbol{\Sigma} _i {\boldsymbol{\ell}}_i$ exchanged among clients during training; and (iii) Classifier. Solid arrows for local training and dashed arrows for global parameter aggregation between clients. See \ref{['sec:UFDA']} for a detailed discussion.
Figure 2: Top: Two-step "transformation" in TCA, from raw data space $\mathbb{R}^p$ to (possibly infinite-dimensional) RKHS $\mathcal{H}$, and then to the low-dimensional $\mathbb{R}^m$. Bottom: Two-step "transformation" in the proposed RF-TCA, from $\mathbb{R}^p$ to random features kernel space $\hat{\mathcal{H}} \subset \mathbb{R}^{2N}$, and then to the low-dimensional $\mathbb{R}^m$.
Figure 3: Classification accuracy and running time of the proposed RF-TCA versus baseline DA methods on DeCAF6 features of Office-Caltech and Office-31 datasets. Blue circles for RF-TCA approach with a different number of random features $N \in \{ 100, 500, 1\,000, 2\,000, 5\,000 \}$, the red, purple, green, brown and orange for TCApan2010domain, JDAlong2013transfer, CORALsun2016return, GFKgong2012geodesic, and DaNNghifary2014domain approach, respectively. The results are obtained by averaging over all source-target domain pairs ($12$ for Office-Caltech and $6$ for Office-31), see \ref{['app_sec:exp']} in the appendix for a detailed exposition of these results.
Figure 4: Classification accuracy (mean $\pm$ standard deviation) of FedRF-TCA and FedAvg with different communication intervals $T_{C} \in \{ 10, 20, 50, 100, 200, 400, 800 \}$, with in total $1\,600$ rounds of communication, as in (I) of \ref{['DigitFive_Ablation']}.
Figure 5: Performance of FedRF-TCA with and without $\boldsymbol{\Sigma} {\boldsymbol{\ell}}$, as well as of FedAvg in the case of explicit and implicit data heterogeneity. For explicit data heterogeneity, we use the same setting as in \ref{['tab:ablation_DigitFive']}; while for implicit data heterogeneity, we evenly divide the MNIST-M (or Synthetic Digits) of Digit-Five dataset into five subsets, so that each subset contains data from similar local data distribution.
...and 2 more figures

Theorems & Definitions (17)

Lemma 1: Equivalent form of vanilla TCA
Definition 1: Unsupervised Federated Domain Adaptation, UFDA
Definition 2: Random Fourier features for Gaussian kernels, rahimi2008random
Theorem 1: Performance guarantee for RF-TCA
Remark 1: Eigen-gap condition
proof : Proof of \ref{['theo:perf_RF_TCA']}
Lemma 2: TCA versus R-TCA
Theorem 2: RFFs approximation of Gaussian kernels, tropp2015matrix
Remark 2: Privacy protection via random features
Lemma 3: Sherman–Morrison
...and 7 more

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

TL;DR

Abstract

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (17)