FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning

Dongping Liao; Xitong Gao; Yabo Xu; Chengzhong Xu

FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning

Dongping Liao, Xitong Gao, Yabo Xu, Chengzhong Xu

TL;DR

FLIP tackles the challenge of evaluating federated prompt learning for CLIP-based vision-language backbones under data scarcity, unseen classes, and cross-domain shifts. It introduces a modular, open-source evaluation framework that benchmarks 8 SOTA federated prompt-learning methods across 4 FL protocols and 12 datasets in 6 scenarios, with a unified training/evaluation interface. The analysis reveals that prompt tuning can achieve strong in-distribution and out-of-distribution generalization with low resource cost, while cross-domain robustness remains challenging; methods with distribution alignment or prompt ensemble techniques offer improved personalization and stability. FLIP provides standardized metrics, a reproducible codebase, and a scalable evaluation paradigm to drive fair comparisons and informed algorithm design in federated prompt learning.

Abstract

The increasing emphasis on privacy and data security has driven the adoption of federated learning, a decentralized approach to train machine learning models without sharing raw data. Prompt learning, which fine-tunes prompt embeddings of pretrained models, offers significant advantages in federated settings by reducing computational costs and communication overheads while leveraging the strong performance and generalization capabilities of vision-language models such as CLIP. This paper addresses the intersection of federated learning and prompt learning, particularly for vision-language models. In this work, we introduce a comprehensive framework, named FLIP, to evaluate federated prompt learning algorithms. FLIP assesses the performance of 8 state-of-the-art federated prompt learning methods across 4 federated learning protocols and 12 open datasets, considering 6 distinct evaluation scenarios. Our findings demonstrate that prompt learning maintains strong generalization performance in both in-distribution and out-of-distribution settings with minimal resource consumption. This work highlights the effectiveness of federated prompt learning in environments characterized by data scarcity, unseen classes, and cross-domain distributional shifts. We open-source the code for all implemented algorithms in FLIP to facilitate further research in this domain.

FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning

TL;DR

Abstract

FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)