Zero-Shot Decentralized Federated Learning
Alessio Masano, Matteo Pennisi, Federica Proietto Salanitri, Concetto Spampinato, Giovanni Bellitto
TL;DR
ZeroDFL introduces a fully decentralized framework for zero-shot federated learning by learning and exchanging textual prompts without a central server. Each client locally adapts a small set of prompt vectors and propagates knowledge through a peer-to-peer, weighted prompt-exchange mechanism, achieving strong generalization across non-IID data while dramatically reducing communication overhead. Empirical results on nine diverse datasets show state-of-the-art performance in heterogeneous settings and competitive results in homogeneous settings, with up to a 118× reduction in transmitted data compared to centralized FedTPG. The approach enhances scalability, privacy, and robustness in decentralized adaptation of vision-language models like CLIP, and opens avenues for adaptive, dataset-aware prompt sharing in real-world deployments.
Abstract
CLIP has revolutionized zero-shot learning by enabling task generalization without fine-tuning. While prompting techniques like CoOp and CoCoOp enhance CLIP's adaptability, their effectiveness in Federated Learning (FL) remains an open challenge. Existing federated prompt learning approaches, such as FedCoOp and FedTPG, improve performance but face generalization issues, high communication costs, and reliance on a central server, limiting scalability and privacy. We propose Zero-shot Decentralized Federated Learning (ZeroDFL), a fully decentralized framework that enables zero-shot adaptation across distributed clients without a central coordinator. ZeroDFL employs an iterative prompt-sharing mechanism, allowing clients to optimize and exchange textual prompts to enhance generalization while drastically reducing communication overhead. We validate ZeroDFL on nine diverse image classification datasets, demonstrating that it consistently outperforms--or remains on par with--state-of-the-art federated prompt learning methods. More importantly, ZeroDFL achieves this performance in a fully decentralized setting while reducing communication overhead by 118x compared to FedTPG. These results highlight that our approach not only enhances generalization in federated zero-shot learning but also improves scalability, efficiency, and privacy preservation--paving the way for decentralized adaptation of large vision-language models in real-world applications.
