Table of Contents
Fetching ...

FedGTST: Boosting Global Transferability of Federated Models via Statistics Tuning

Evelyn Ma, Chao Pan, Rasoul Etesami, Han Zhao, Olgica Milenkovic

TL;DR

This work introduces a client-server exchange protocol that leverages cross-client Jacobian (gradient) norms to boost transferability and demonstrates that increasing the average Jacobian and reducing its variance allows for tighter control of the target loss.

Abstract

The performance of Transfer Learning (TL) heavily relies on effective pretraining, which demands large datasets and substantial computational resources. As a result, executing TL is often challenging for individual model developers. Federated Learning (FL) addresses these issues by facilitating collaborations among clients, expanding the dataset indirectly, distributing computational costs, and preserving privacy. However, key challenges remain unresolved. First, existing FL methods tend to optimize transferability only within local domains, neglecting the global learning domain. Second, most approaches rely on indirect transferability metrics, which do not accurately reflect the final target loss or true degree of transferability. To address these gaps, we propose two enhancements to FL. First, we introduce a client-server exchange protocol that leverages cross-client Jacobian (gradient) norms to boost transferability. Second, we increase the average Jacobian norm across clients at the server, using this as a local regularizer to reduce cross-client Jacobian variance. Our transferable federated algorithm, termed FedGTST (Federated Global Transferability via Statistics Tuning), demonstrates that increasing the average Jacobian and reducing its variance allows for tighter control of the target loss. This leads to an upper bound on the target loss in terms of the source loss and source-target domain discrepancy. Extensive experiments on datasets such as MNIST to MNIST-M and CIFAR10 to SVHN show that FedGTST outperforms relevant baselines, including FedSR. On the second dataset pair, FedGTST improves accuracy by 9.8% over FedSR and 7.6% over FedIIR when LeNet is used as the backbone.

FedGTST: Boosting Global Transferability of Federated Models via Statistics Tuning

TL;DR

This work introduces a client-server exchange protocol that leverages cross-client Jacobian (gradient) norms to boost transferability and demonstrates that increasing the average Jacobian and reducing its variance allows for tighter control of the target loss.

Abstract

The performance of Transfer Learning (TL) heavily relies on effective pretraining, which demands large datasets and substantial computational resources. As a result, executing TL is often challenging for individual model developers. Federated Learning (FL) addresses these issues by facilitating collaborations among clients, expanding the dataset indirectly, distributing computational costs, and preserving privacy. However, key challenges remain unresolved. First, existing FL methods tend to optimize transferability only within local domains, neglecting the global learning domain. Second, most approaches rely on indirect transferability metrics, which do not accurately reflect the final target loss or true degree of transferability. To address these gaps, we propose two enhancements to FL. First, we introduce a client-server exchange protocol that leverages cross-client Jacobian (gradient) norms to boost transferability. Second, we increase the average Jacobian norm across clients at the server, using this as a local regularizer to reduce cross-client Jacobian variance. Our transferable federated algorithm, termed FedGTST (Federated Global Transferability via Statistics Tuning), demonstrates that increasing the average Jacobian and reducing its variance allows for tighter control of the target loss. This leads to an upper bound on the target loss in terms of the source loss and source-target domain discrepancy. Extensive experiments on datasets such as MNIST to MNIST-M and CIFAR10 to SVHN show that FedGTST outperforms relevant baselines, including FedSR. On the second dataset pair, FedGTST improves accuracy by 9.8% over FedSR and 7.6% over FedIIR when LeNet is used as the backbone.

Paper Structure

This paper contains 23 sections, 9 theorems, 34 equations, 3 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumptions asp:conv-smooth (Convexity and Smoothness), the optimal target loss is bounded as where $h^{*(k)}$ denotes the optimal local model of client $k$ (see Section sec:prelim).

Figures (3)

  • Figure 1: Visualization of Convergence Results. We use CIFAR10 $\to$ SVHN with $K=100$ as an example. The top two plots correspond to a fraction of $10\%$ of participating clients, while the bottom two plots correspond to $100\%$ participation. We report the training and test accuracy along with finetuned epochs for both settings. The grey dashed lines represent FedAVG, where the coefficient for the regularizer term is set to $0$. Other lines represent FedGTST with tuned coefficients.
  • Figure 2: Cross-client statistics tuning via FedGTST. We use CIFAR10$\to$SVHN with $K=100$ as an example. The left plot reports the global Jacobian (gradient) norm versus the index of the federated round. The grey dashed line represents FedAVG, while other lines correspond to FedGTST with different coefficients. We truncate the plot to only capture the results of the first $100$ rounds, since at the end of training the gradient norm should drop to a value close to $0$ due to convergence, and we are only interested in observing the behaviour of Jacobian norms during relative early pretraining stages. We select the best-performing setup from the left plot (the red line with coefficient $1e-3$), and then in the right plot, compare its variance during a federated round with that of FedAVG. The blue line represents FedAVG and the yellow line corresponds to FedGTST. The yellow line terminated earlier since all experiments are averaged over $3$ runs and aligned with the run that converges the fastest.
  • Figure 3: An example for constructing a non-iid marginal distribution for the Cifar10 dataset allocated to $10$ clients. Each client has access to only two labels. We also make sure that no samples are used by more than one client.

Theorems & Definitions (22)

  • Definition 1: ${\mathcal{G}}, {\mathcal{F}}$-discrepancy xu2022adversarially
  • Remark
  • Definition 2: Cross-Client Divengence for TFL
  • Definition 3: Source-Target Discrepancy for TFL
  • Remark
  • Theorem 1: Bound Based on TFL-specific Domain Discrepancy
  • Definition 4: Cross-Client Statistics
  • Lemma 4.1: Loss Bound Using Cross-Client Statistics
  • Theorem 2: Tightened Bound Based on Cross-Client Statistics
  • Definition 5: ${\mathcal{H}}$-discrepancy cortes2019adaptationzhao2019learning
  • ...and 12 more