Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Jun Luo, Chen Chen, Shandong Wu
TL;DR
The paper addresses the challenge of adapting CLIP-like vision–language models in federated settings where communicating large models is costly, by introducing pFedMoAP, a personalized federated mixture of adaptive prompts. Clients download multiple pre-aggregated prompts as non-local experts and use a client-specific attention-based gating network to fuse local and non-local prompt knowledge, enabling effective MoE-based personalization with minimal overhead. The gating network operates on a reduced feature space ($d_{gating}=128$) and supports a flexible number of experts, while a KNN-based strategy selects non-local prompts from a server pool. Empirical results across 9 datasets under diverse non-IID settings show substantial improvements over state-of-the-art federated prompt methods, including robustness to feature and label shifts and resilience under differential privacy, highlighting the practical value of shared non-local prompts for VLMs in privacy-preserving collaborative learning.
Abstract
Federated prompt learning benefits federated learning with CLIP-like Vision-Language Model's (VLM's) robust representation learning ability through prompt learning. However, current federated prompt learning methods are habitually restricted to the traditional FL paradigm, where the participating clients are generally only allowed to download a single globally aggregated model from the server. While justifiable for training full-sized models under federated settings, in this work, we argue that this paradigm is ill-suited for lightweight prompts. By facilitating the clients to download multiple pre-aggregated prompts as fixed non-local experts, we propose Personalized Federated Mixture of Adaptive Prompts (pFedMoAP), a novel FL framework that personalizes the prompt learning process through the lens of Mixture of Experts (MoE). pFedMoAP implements a local attention-based gating network that learns to generate enhanced text features for better alignment with local image data, benefiting from both local and downloaded non-local adaptive prompt experts. Extensive experiments on 9 datasets under various federated settings demonstrate the efficacy of the proposed pFedMoAP algorithm. The code is available at https://github.com/ljaiverson/pFedMoAP.
