Cooperative Pseudo Labeling for Unsupervised Federated Classification
Kuangpu Guo, Lijun Sheng, Yongcan Yu, Jian Liang, Zilei Wang, Ran He
TL;DR
This work addresses unsupervised federated classification by leveraging CLIP in a distributed, unlabeled-data setting. It introduces FedCoPL, a framework that combines cooperative pseudo labeling to mitigate CLIP bias and label skew with partial prompt aggregation to balance global collaboration and local personalization. Empirical results across diverse datasets and skew types show that FedCoPL consistently outperforms baselines, with ablations validating the contributions of both pseudo labeling and prompt aggregation. The approach enables effective zero-shot-like classification in federated environments while preserving client privacy and reducing communication overhead, marking a practical step toward robust unsupervised FL with vision-language models.
Abstract
Unsupervised Federated Learning (UFL) aims to collaboratively train a global model across distributed clients without sharing data or accessing label information. Previous UFL works have predominantly focused on representation learning and clustering tasks. Recently, vision language models (e.g., CLIP) have gained significant attention for their powerful zero-shot prediction capabilities. Leveraging this advancement, classification problems that were previously infeasible under the UFL paradigm now present promising new opportunities, yet remain largely unexplored. In this paper, we extend UFL to the classification problem with CLIP for the first time and propose a novel method, \underline{\textbf{Fed}}erated \underline{\textbf{Co}}operative \underline{\textbf{P}}seudo \underline{\textbf{L}}abeling (\textbf{FedCoPL}). Specifically, clients estimate and upload their pseudo label distribution, and the server adjusts and redistributes them to avoid global imbalance among classes. Moreover, we introduce a partial prompt aggregation protocol for effective collaboration and personalization. In particular, visual prompts containing general image features are aggregated at the server, while text prompts encoding personalized knowledge are retained locally. Extensive experiments demonstrate the superior performance of our FedCoPL compared to baseline methods. Our code is available at \href{https://github.com/krumpguo/FedCoPL}{https://github.com/krumpguo/FedCoPL}.
