CAPT: Class-Aware Prompt Tuning for Federated Long-Tailed Learning with Vision-Language Model
Shihao Hou, Xinyi Shang, Shreyank N Gowda, Yang Lu, Chao Wu, Yan Yan, Hanzi Wang
TL;DR
This work addresses federated learning under the dual challenges of non-IID data and long-tailed class distributions by introducing CAPT, a framework that leverages a pre-trained vision-language model through a dual-prompt design and heterogeneity-aware client clustering. The method combines a general prompt for domain-invariant representation with class-aware prompts for per-class discrimination, all aligned via a vision-language mapping and a joint contrastive objective. A two-pronged clustering strategy groups clients by distribution similarity and complementary head-tail coverage, complemented by a Multi-Armed Bandit scheduler for communication efficiency. Comprehensive experiments across CIFAR-10-LT, CIFAR-100-LT, Fashion-MNIST-LT, and ImageNet-LT demonstrate CAPT’s ability to substantially improve tail-class performance while maintaining competitive overall accuracy, outperforming state-of-the-art prompt-tuning and FL methods. The work provides theoretical insight into why traditional prompt tuning struggles in federated long-tailed settings and offers a practical, scalable solution with broad implications for robust, fair, and efficient federated learning using vision-language models.
Abstract
Effectively handling the co-occurrence of non-IID data and long-tailed distributions remains a critical challenge in federated learning. While fine-tuning vision-language models (VLMs) like CLIP has shown to be promising in addressing non-IID data challenges, this approach leads to severe degradation of tail classes in federated long-tailed scenarios. Under the composite effects of strong non-IID data distribution and long-tailed class imbalances, VLM fine-tuning may even fail to yield any improvement. To address this issue, we propose Class-Aware Prompt Learning for Federated Long-tailed Learning (CAPT), a novel framework that leverages a pre-trained VLM to effectively handle both data heterogeneity and long-tailed distributions. CAPT introduces a dual-prompt mechanism that synergizes general and class-aware prompts, enabling the framework to capture global trends while preserving class-specific knowledge. To better aggregate and share knowledge across clients, we introduce a heterogeneity-aware client clustering strategy that groups clients based on their data distributions, enabling efficient collaboration and knowledge sharing. Extensive experiments on various long-tailed datasets with different levels of data heterogeneity demonstrate that CAPT significantly improves tail class performance without compromising overall accuracy, outperforming state-of-the-art methods in federated long-tailed learning scenarios.
