Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
Wenlong Deng, Christos Thrampoulidis, Xiaoxiao Li
TL;DR
The paper tackles federated learning with vision transformers under data heterogeneity by proposing SGPT, a prompt-tuning-based framework that combines shared prompts for universal knowledge with group prompts for local specialization. A prompt selection module assigns inputs to data groups, enabling sample-level adaptation without local fine-tuning, while a block coordinate descent optimization alternates between learning shared information and group-specific knowledge. The authors provide a theoretical bound on the global-local performance gap in terms of generalization and distribution discrepancy, and empirically validate SGPT on label- and feature-heterogeneous benchmarks, showing superior global and worst-local performance with improved efficiency. This approach offers a practical, scalable path to robust FL with ViT by leveraging prompt-tuning to navigate cross-client heterogeneity and reduce communication and computation costs.
Abstract
Vision Transformers (ViT) and Visual Prompt Tuning (VPT) achieve state-of-the-art performance with improved efficiency in various computer vision tasks. This suggests a promising paradigm shift of adapting pre-trained ViT models to Federated Learning (FL) settings. However, the challenge of data heterogeneity among FL clients presents a significant hurdle in effectively deploying ViT models. Existing Generalized FL (GFL) and Personalized FL (PFL) methods have limitations in balancing performance across both global and local data distributions. In this paper, we present a novel algorithm, SGPT, that integrates GFL and PFL approaches by employing a unique combination of both shared and group-specific prompts. This design enables SGPT to capture both common and group-specific features. A key feature of SGPT is its prompt selection module, which facilitates the training of a single global model capable of automatically adapting to diverse local client data distributions without the need for local fine-tuning. To effectively train the prompts, we utilize block coordinate descent (BCD), learning from common feature information (shared prompts), and then more specialized knowledge (group prompts) iteratively. Theoretically, we justify that learning the proposed prompts can reduce the gap between global and local performance. Empirically, we conduct experiments on both label and feature heterogeneity settings in comparison with state-of-the-art baselines, along with extensive ablation studies, to substantiate the superior performance of SGPT.
