Table of Contents
Fetching ...

Zero-Shot Decentralized Federated Learning

Alessio Masano, Matteo Pennisi, Federica Proietto Salanitri, Concetto Spampinato, Giovanni Bellitto

TL;DR

ZeroDFL introduces a fully decentralized framework for zero-shot federated learning by learning and exchanging textual prompts without a central server. Each client locally adapts a small set of prompt vectors and propagates knowledge through a peer-to-peer, weighted prompt-exchange mechanism, achieving strong generalization across non-IID data while dramatically reducing communication overhead. Empirical results on nine diverse datasets show state-of-the-art performance in heterogeneous settings and competitive results in homogeneous settings, with up to a 118× reduction in transmitted data compared to centralized FedTPG. The approach enhances scalability, privacy, and robustness in decentralized adaptation of vision-language models like CLIP, and opens avenues for adaptive, dataset-aware prompt sharing in real-world deployments.

Abstract

CLIP has revolutionized zero-shot learning by enabling task generalization without fine-tuning. While prompting techniques like CoOp and CoCoOp enhance CLIP's adaptability, their effectiveness in Federated Learning (FL) remains an open challenge. Existing federated prompt learning approaches, such as FedCoOp and FedTPG, improve performance but face generalization issues, high communication costs, and reliance on a central server, limiting scalability and privacy. We propose Zero-shot Decentralized Federated Learning (ZeroDFL), a fully decentralized framework that enables zero-shot adaptation across distributed clients without a central coordinator. ZeroDFL employs an iterative prompt-sharing mechanism, allowing clients to optimize and exchange textual prompts to enhance generalization while drastically reducing communication overhead. We validate ZeroDFL on nine diverse image classification datasets, demonstrating that it consistently outperforms--or remains on par with--state-of-the-art federated prompt learning methods. More importantly, ZeroDFL achieves this performance in a fully decentralized setting while reducing communication overhead by 118x compared to FedTPG. These results highlight that our approach not only enhances generalization in federated zero-shot learning but also improves scalability, efficiency, and privacy preservation--paving the way for decentralized adaptation of large vision-language models in real-world applications.

Zero-Shot Decentralized Federated Learning

TL;DR

ZeroDFL introduces a fully decentralized framework for zero-shot federated learning by learning and exchanging textual prompts without a central server. Each client locally adapts a small set of prompt vectors and propagates knowledge through a peer-to-peer, weighted prompt-exchange mechanism, achieving strong generalization across non-IID data while dramatically reducing communication overhead. Empirical results on nine diverse datasets show state-of-the-art performance in heterogeneous settings and competitive results in homogeneous settings, with up to a 118× reduction in transmitted data compared to centralized FedTPG. The approach enhances scalability, privacy, and robustness in decentralized adaptation of vision-language models like CLIP, and opens avenues for adaptive, dataset-aware prompt sharing in real-world deployments.

Abstract

CLIP has revolutionized zero-shot learning by enabling task generalization without fine-tuning. While prompting techniques like CoOp and CoCoOp enhance CLIP's adaptability, their effectiveness in Federated Learning (FL) remains an open challenge. Existing federated prompt learning approaches, such as FedCoOp and FedTPG, improve performance but face generalization issues, high communication costs, and reliance on a central server, limiting scalability and privacy. We propose Zero-shot Decentralized Federated Learning (ZeroDFL), a fully decentralized framework that enables zero-shot adaptation across distributed clients without a central coordinator. ZeroDFL employs an iterative prompt-sharing mechanism, allowing clients to optimize and exchange textual prompts to enhance generalization while drastically reducing communication overhead. We validate ZeroDFL on nine diverse image classification datasets, demonstrating that it consistently outperforms--or remains on par with--state-of-the-art federated prompt learning methods. More importantly, ZeroDFL achieves this performance in a fully decentralized setting while reducing communication overhead by 118x compared to FedTPG. These results highlight that our approach not only enhances generalization in federated zero-shot learning but also improves scalability, efficiency, and privacy preservation--paving the way for decentralized adaptation of large vision-language models in real-world applications.

Paper Structure

This paper contains 16 sections, 6 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: ZeroDFL strategy overview. ZeroDFL operates in iterative training rounds, each consisting of two phases: local adaptation and prompt exchange. Left: Local Adaptation – Each client selects $M$ textual prompts from its Prompt Pool (prompts received from other clients) and fine-tunes them by prepending them to a frozen CLIP text encoder and optimizing on its private dataset. Right: Prompt Exchange – After local adaptation, the client shares its updated prompts with S selected clients, prioritizing those that have received fewer updates in previous rounds, ensuring balanced knowledge distribution.
  • Figure 2: Cumulative communication cost over 500 federated rounds in the heterogeneous setting with 59 clients. The plot compares FedTPG with three configurations of ZeroDFL: Worst (maximum communication overhead, ensuring full knowledge propagation), a balanced setting with $S=5$, and Best (minimal communication overhead, requiring careful prompt distribution).