Table of Contents
Fetching ...

Generalized and Personalized Federated Learning with Foundation Models via Orthogonal Transformations

Eun Gyung Kong, Je Won Yeom, Yonghoon Jeon, Taesup Kim

TL;DR

This work tackles the challenge of balancing generalization and personalization in federated learning under data heterogeneity by leveraging foundation models in a black-box setting. It introduces FedOT, which keeps a fixed vision encoder, learns a globally shared classifier, and uses per-client orthogonal feature transforms implemented via the Cayley transform to enable local adaptation without tampering with the encoder. Theoretical analysis shows that orthogonal transforms bound gradient differences across clients, with a tight upper bound of $4\tau$ when the local transforms have condition number $\kappa=1$, and block-diagonal variants offer a controllable trade-off between expressivity and complexity. Empirically, FedOT and its block-diagonal extension FedOT(+B) outperform baselines across five domain-shift datasets, delivering strong generalization, personalized performance, and robust behavior under varying communication rounds, while preserving data privacy and foundation-model IP. The approach highlights practical implications for deploying secure, scalable, and adaptable FL systems that capitalize on large foundation models.

Abstract

Federated Learning (FL) aims to train models across decentralized clients or devices holding local data without the need for centralized data collection, thus enhancing data privacy and security. However, achieving both generalization and personalization in heterogeneous settings remains a significant challenge. To address this, we introduce FedOT, a novel approach that leverages black-box foundation models. FedOT shares only a global task-dependent classifier across clients while locally adapting features through orthogonal transformations. By enforcing orthogonality, FedOT mitigates gradient conflicts across diverse clients, preserves semantic integrity, and achieves robust performance even in the presence of substantial data heterogeneity. The strategy of combining global and local parameters enables a more balanced approach for both generalization and personalization, outperforming baseline FL methods across multiple benchmarks. Furthermore, our extensive analysis confirms that joint optimization of global classifiers and local orthogonal transformations yields superior performance and suggests broader applicability.

Generalized and Personalized Federated Learning with Foundation Models via Orthogonal Transformations

TL;DR

This work tackles the challenge of balancing generalization and personalization in federated learning under data heterogeneity by leveraging foundation models in a black-box setting. It introduces FedOT, which keeps a fixed vision encoder, learns a globally shared classifier, and uses per-client orthogonal feature transforms implemented via the Cayley transform to enable local adaptation without tampering with the encoder. Theoretical analysis shows that orthogonal transforms bound gradient differences across clients, with a tight upper bound of when the local transforms have condition number , and block-diagonal variants offer a controllable trade-off between expressivity and complexity. Empirically, FedOT and its block-diagonal extension FedOT(+B) outperform baselines across five domain-shift datasets, delivering strong generalization, personalized performance, and robust behavior under varying communication rounds, while preserving data privacy and foundation-model IP. The approach highlights practical implications for deploying secure, scalable, and adaptable FL systems that capitalize on large foundation models.

Abstract

Federated Learning (FL) aims to train models across decentralized clients or devices holding local data without the need for centralized data collection, thus enhancing data privacy and security. However, achieving both generalization and personalization in heterogeneous settings remains a significant challenge. To address this, we introduce FedOT, a novel approach that leverages black-box foundation models. FedOT shares only a global task-dependent classifier across clients while locally adapting features through orthogonal transformations. By enforcing orthogonality, FedOT mitigates gradient conflicts across diverse clients, preserves semantic integrity, and achieves robust performance even in the presence of substantial data heterogeneity. The strategy of combining global and local parameters enables a more balanced approach for both generalization and personalization, outperforming baseline FL methods across multiple benchmarks. Furthermore, our extensive analysis confirms that joint optimization of global classifiers and local orthogonal transformations yields superior performance and suggests broader applicability.

Paper Structure

This paper contains 67 sections, 1 theorem, 31 equations, 5 figures, 10 tables, 1 algorithm.

Key Result

Theorem 4.1

(Gradient Difference Bound) Let $i$ and $j$ be two distinct clients. Then the $\ell_2$-norm of the difference between their global-parameter gradients satisfies where $\kappa(\cdot)$ is the condition number of the corresponding linear transformation. Moreover, if $\kappa\bigl(w_{\mathrm{l}}^{(i)}\bigr) = \kappa\bigl(w_{\mathrm{l}}^{(j)}\bigr) = 1$, then the bound becomes $4\tau$, which is the sma

Figures (5)

  • Figure 1: Overview of our proposed FL framework.FedOT leverages the pre-trained vision encoder into an FL environment. The image encoder, deployed in binary or embedded formats, operates with orthogonal transformation used in the local parameters for client-specific personalization, while holding a globally shared classifier. Through federated updates, our method strikes a balance between personalization and generalization, preserving both data privacy and model intellectual property.
  • Figure 2: Comparison of DOF and personalization accuracy on FEMNIST and OfficeHome. We vary the number of blocks $R=2^N$ in our block-diagonal transforms. The left axis depicts the change in personalization accuracy relative to $N=0$ ($R=1$), while the right axis shows the corresponding DOF values.
  • Figure 3: Comparison of generalization and personalization performance for the local-only, global-only, and FedOT on FEMNIST, PACS, and TerraIncognita. FedOT consistently outperforms both ablated methods, highlighting the complementary benefits of jointly leveraging local and global parameters.
  • Figure 4: Comparison of comprehensive accuracy on PACS across different communication rounds using FedOT and FedCLIP.
  • Figure 5: Average performance improvement of test clients in a multi-client scenario. The x-axis shows the number of training clients, while the y-axis indicates the average performance enhancement compared to CLIP's zero-shot performance.

Theorems & Definitions (1)

  • Theorem 4.1