Table of Contents
Fetching ...

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

Weihao Bo, Yanpeng Sun, Yu Wang, Xinyu Zhang, Zechao Li

TL;DR

FedMGP tackles privacy-preserving personalization in federated vision-language learning by introducing multiple text-visual prompt groups per client. A diversity loss enforces specialization among groups, while a similarity-guided dynamic aggregation selects semantically aligned prompt groups for server aggregation, balancing shared cross-client knowledge with client-specific patterns. Empirical results across nine base-to-novel and domain-shift benchmarks show FedMGP achieving state-of-the-art performance with minimal communication and strong domain generalization, supported by ablations and visual analyses. The approach offers robust personalization and cross-domain transfer in heterogeneous federated settings, with practical implications for privacy-aware multimodal systems.

Abstract

In this paper, we introduce FedMGP, a new paradigm for personalized federated prompt learning in vision-language models. FedMGP equips each client with multiple groups of paired textual and visual prompts, enabling the model to capture diverse, fine-grained semantic and instance-level cues. A diversity loss is introduced to drive each prompt group to specialize in distinct and complementary semantic aspects, ensuring that the groups collectively cover a broader range of local characteristics. During communication, FedMGP employs a dynamic prompt aggregation strategy based on similarity-guided probabilistic sampling: each client computes the cosine similarity between its prompt groups and the global prompts from the previous round, then samples s groups via a softmax-weighted distribution. This soft selection mechanism preferentially aggregates semantically aligned knowledge while still enabling exploration of underrepresented patterns effectively balancing the preservation of common knowledge with client-specific features. Notably, FedMGP maintains parameter efficiency by redistributing a fixed prompt capacity across multiple groups, achieving state-of-the-art performance with the lowest communication parameters among all federated prompt learning methods. Theoretical analysis shows that our dynamic aggregation strategy promotes robust global representation learning by reinforcing shared semantics while suppressing client-specific noise. Extensive experiments demonstrate that FedMGP consistently outperforms prior approaches in both personalization and domain generalization across diverse federated vision-language benchmarks. The code will be released on https://github.com/weihao-bo/FedMGP.git.

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

TL;DR

FedMGP tackles privacy-preserving personalization in federated vision-language learning by introducing multiple text-visual prompt groups per client. A diversity loss enforces specialization among groups, while a similarity-guided dynamic aggregation selects semantically aligned prompt groups for server aggregation, balancing shared cross-client knowledge with client-specific patterns. Empirical results across nine base-to-novel and domain-shift benchmarks show FedMGP achieving state-of-the-art performance with minimal communication and strong domain generalization, supported by ablations and visual analyses. The approach offers robust personalization and cross-domain transfer in heterogeneous federated settings, with practical implications for privacy-aware multimodal systems.

Abstract

In this paper, we introduce FedMGP, a new paradigm for personalized federated prompt learning in vision-language models. FedMGP equips each client with multiple groups of paired textual and visual prompts, enabling the model to capture diverse, fine-grained semantic and instance-level cues. A diversity loss is introduced to drive each prompt group to specialize in distinct and complementary semantic aspects, ensuring that the groups collectively cover a broader range of local characteristics. During communication, FedMGP employs a dynamic prompt aggregation strategy based on similarity-guided probabilistic sampling: each client computes the cosine similarity between its prompt groups and the global prompts from the previous round, then samples s groups via a softmax-weighted distribution. This soft selection mechanism preferentially aggregates semantically aligned knowledge while still enabling exploration of underrepresented patterns effectively balancing the preservation of common knowledge with client-specific features. Notably, FedMGP maintains parameter efficiency by redistributing a fixed prompt capacity across multiple groups, achieving state-of-the-art performance with the lowest communication parameters among all federated prompt learning methods. Theoretical analysis shows that our dynamic aggregation strategy promotes robust global representation learning by reinforcing shared semantics while suppressing client-specific noise. Extensive experiments demonstrate that FedMGP consistently outperforms prior approaches in both personalization and domain generalization across diverse federated vision-language benchmarks. The code will be released on https://github.com/weihao-bo/FedMGP.git.

Paper Structure

This paper contains 37 sections, 1 theorem, 40 equations, 10 figures, 11 tables, 1 algorithm.

Key Result

Theorem F.1

Under Assumptions assumption:feature_decomposition, assumption:prompt_representation, and assumption:data_heterogeneity, for any number of selected prompt groups $s\in[1,G]$, we have:

Figures (10)

  • Figure 1: Overview of FedMGP: The left portion shows the server distributing global prompts to clients; the middle portion illustrates the multi-group text-visual prompt co-learning mechanism within each client; and the right portion demonstrates the dynamic prompt aggregation strategy across communication rounds.
  • Figure 2: Few shot experiment from 1 to 16 shots
  • Figure 3: Parameter analysis of FedMGP and other state-of-the-art methods.
  • Figure 4: Ablation study on prompt leangth($l$)
  • Figure 5: Ablation study on Prompt Groups($m$)
  • ...and 5 more figures

Theorems & Definitions (2)

  • Theorem F.1: Dynamic Aggregation Superiority
  • proof