Table of Contents
Fetching ...

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning

Hongxia Li, Wei Huang, Jingya Wang, Ye Shi

TL;DR

This work tackles data heterogeneity in federated learning for vision-language models by introducing FedOTP, which jointly learns a global prompt for cross-client consensus and a personalized local prompt for client-specific traits within a CLIP-based framework. It leverages unbalanced Optimal Transport to align local visual features with both prompts, enabling selective focus on the most relevant image patches and a fast Dykstra-based solver for efficiency. The authors provide a generalization bound under Lipschitz assumptions and demonstrate that FedOTP outperforms state-of-the-art prompt-based and traditional PFL methods across diverse label-shift and feature-shift scenarios, with qualitative visualizations showing distinct roles for global and local prompts. The approach reduces communication and preserves personalization, offering robust performance in highly heterogeneous settings with practical implications for scalable, privacy-preserving deployment of vision-language models.

Abstract

Prompt learning in pretrained visual-language models has shown remarkable flexibility across various downstream tasks. Leveraging its inherent lightweight nature, recent research attempted to integrate the powerful pretrained models into federated learning frameworks to simultaneously reduce communication costs and promote local training on insufficient data. Despite these efforts, current federated prompt learning methods lack specialized designs to systematically address severe data heterogeneities, e.g., data distribution with both label and feature shifts involved. To address this challenge, we present Federated Prompts Cooperation via Optimal Transport (FedOTP), which introduces efficient collaborative prompt learning strategies to capture diverse category traits on a per-client basis. Specifically, for each client, we learn a global prompt to extract consensus knowledge among clients, and a local prompt to capture client-specific category characteristics. Unbalanced Optimal Transport is then employed to align local visual features with these prompts, striking a balance between global consensus and local personalization. By relaxing one of the equality constraints, FedOTP enables prompts to focus solely on the core regions of image patches. Extensive experiments on datasets with various types of heterogeneities have demonstrated that our FedOTP outperforms the state-of-the-art methods.

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning

TL;DR

This work tackles data heterogeneity in federated learning for vision-language models by introducing FedOTP, which jointly learns a global prompt for cross-client consensus and a personalized local prompt for client-specific traits within a CLIP-based framework. It leverages unbalanced Optimal Transport to align local visual features with both prompts, enabling selective focus on the most relevant image patches and a fast Dykstra-based solver for efficiency. The authors provide a generalization bound under Lipschitz assumptions and demonstrate that FedOTP outperforms state-of-the-art prompt-based and traditional PFL methods across diverse label-shift and feature-shift scenarios, with qualitative visualizations showing distinct roles for global and local prompts. The approach reduces communication and preserves personalization, offering robust performance in highly heterogeneous settings with practical implications for scalable, privacy-preserving deployment of vision-language models.

Abstract

Prompt learning in pretrained visual-language models has shown remarkable flexibility across various downstream tasks. Leveraging its inherent lightweight nature, recent research attempted to integrate the powerful pretrained models into federated learning frameworks to simultaneously reduce communication costs and promote local training on insufficient data. Despite these efforts, current federated prompt learning methods lack specialized designs to systematically address severe data heterogeneities, e.g., data distribution with both label and feature shifts involved. To address this challenge, we present Federated Prompts Cooperation via Optimal Transport (FedOTP), which introduces efficient collaborative prompt learning strategies to capture diverse category traits on a per-client basis. Specifically, for each client, we learn a global prompt to extract consensus knowledge among clients, and a local prompt to capture client-specific category characteristics. Unbalanced Optimal Transport is then employed to align local visual features with these prompts, striking a balance between global consensus and local personalization. By relaxing one of the equality constraints, FedOTP enables prompts to focus solely on the core regions of image patches. Extensive experiments on datasets with various types of heterogeneities have demonstrated that our FedOTP outperforms the state-of-the-art methods.
Paper Structure (35 sections, 3 theorems, 23 equations, 8 figures, 9 tables, 2 algorithms)

This paper contains 35 sections, 3 theorems, 23 equations, 8 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Suppose $\mathcal{\hat{D}}_1,\cdots,\mathcal{\hat{D}}_N$ denote empirical data distribution of $N$ clients with learned parameters $\hat{P}_g$ and $\hat{P}_{l,i}$, and $P_g^\ast$ and $P_{l,i}^\ast$ are optimal parameters for the real distribution $\mathcal{D}_1,\cdots,\mathcal{D}_N$. Let $\mathcal{H

Figures (8)

  • Figure 1: Overview of our FedOTP. On the left, clients transmit global prompts to the server for aggregation while retaining local prompts locally. The right shows the workflow of Global-Local prompt cooperation mechanism, which employs unbalanced Optimal Transport to align visual feature maps with each prompt.
  • Figure 2: Heatmaps of similarity between text features and image feature maps for different methods on 4 categories in OxfordPets dataset. "FedOTP-G" denotes the results from the global prompt and "FedOTP-L" refers to the local prompt.
  • Figure 3: Performance with the different number of shots.
  • Figure A1: Examples of raw instances from two datasets with multiple domains: DomainNet (left) and Office-Caltech10 (right). We present five classes for each dataset to show the feature shift across their sub-datasets.
  • Figure A2: Visualization of three Non-IID settings on the Office-Caltech10 dataset. Each dot represents a set of samples within specific classes assigned to a client, with the dot size indicating the number of samples. The feature shifts are denoted by different colors.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1: Generalization Bound of FedOTP
  • Lemma 1: McDiarmid's Inequality mohri2018foundations
  • Lemma 2: Rademacher Complexity mohri2018foundations