Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks

S. Kawa Atapour; S. Jamal SeyedMohammadi; S. Mohammad Sheikholeslami; Jamshid Abouei; Konstantinos N. Plataniotis; Arash Mohammadi

Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks

S. Kawa Atapour, S. Jamal SeyedMohammadi, S. Mohammad Sheikholeslami, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi

TL;DR

The paper tackles the challenge of training deep models in resource-constrained IoT edge networks by moving the foundation model to the server and distilling its knowledge into a learnable prompt generator, enabling efficient, data-free federated distillation at the edge. It introduces FedD2P, a framework that uses per-class knowledge from heterogeneous devices, a linguistic-assisted prompt generator, and prompt-tuning to adapt a vision-language FM without local deployment; the FM remains frozen, while the prompt generator and lightweight local models enable effective downstream task performance. Key contributions include a data-free mutual KD workflow with per-class knowledge exchange, the design of the LA prompt generator leveraging linguistic descriptions and self-attention, and comprehensive simulations showing competitive accuracy and improved efficiency across five datasets (CIFAR, SVHN, OxfordPets, EuroSAT, DTD) under varying heterogeneity and temperature settings. Overall, FedD2P demonstrates that server-hosted FMs, coupled with per-class KD and linguistically guided prompts, can significantly reduce edge resource requirements while maintaining strong generalization in federated edge learning scenarios, offering practical impact for privacy-preserving, communication-efficient AI on IoT networks.

Abstract

Recently pre-trained Foundation Models (FMs) have been combined with Federated Learning (FL) to improve training of downstream tasks while preserving privacy. However, deploying FMs over edge networks with resource-constrained Internet of Things (IoT) devices is under-explored. This paper proposes a novel framework, namely, Federated Distilling knowledge to Prompt (FedD2P), for leveraging the robust representation abilities of a vision-language FM without deploying it locally on edge devices. This framework distills the aggregated knowledge of IoT devices to a prompt generator to efficiently adapt the frozen FM for downstream tasks. To eliminate the dependency on a public dataset, our framework leverages perclass local knowledge from IoT devices and linguistic descriptions of classes to train the prompt generator. Our experiments on diverse image classification datasets CIFAR, OxfordPets, SVHN, EuroSAT, and DTD show that FedD2P outperforms the baselines in terms of model performance.

Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks

TL;DR

Abstract

Paper Structure (13 sections, 6 equations, 2 figures, 1 table)

This paper contains 13 sections, 6 equations, 2 figures, 1 table.

Introduction
Background and System Model
Knowledge Distillation (KD)
Prompt Tuning
Federated Learning over Edge
Federated Distilling Knowledge To Prompt (FedD2P)
The Flow of knowledge
Linguistic Assistance Prompt Generation
Simulation Results
Evaluation Under Different Statistical Heterogeneity
Evaluation Under Different Temperature Parameter
Effectiveness of LA prompt generator
Conclusion

Figures (2)

Figure 1: The proposed FedD2P framework. In 1) the per-class local knowledge of IoT devices, denoted as $\boldsymbol{l}^n_c,$ for ($1 \leq n \leq N$) are aggregated at the server, resulting in the per-class global knowledge $\boldsymbol{a}_c$. In 2) the LA prompt generator poduces per-class prompts $[\boldsymbol{h}_c]_{c=1}^C$ using the semantic representation of classes $[\boldsymbol{e}_c]_{c=1}^C$. Subsequently, the image and text encoders generate semantic features for their respective prompts, i.e., $[\boldsymbol{m}_c = F_{image}(\boldsymbol{h}_c)]_{c=1}^C$ , $\boldsymbol{e}_c = F_{text}(\boldsymbol{s}_c)$ respectively. The per-class global knowledge $\boldsymbol{g}_c$ is subsequently determined by calculating the cosine similarity between these semantic features. In 4) the per-class aggregated knowledge $\boldsymbol{g}_c$ and ground-truth output $\boldsymbol{y}_c$ are used to tune the LA generator, while the backbone FM remains freezed. Finally, in 5), the global knowledge is transmitted to IoT devices to facilitate local knowledge distillation.
Figure 2: (a) Sensitivity of the FedD2P framework to the temperature parameter. (b) Effectiveness of the multi-head self-attention mechanism in the LA prompt generator.

Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks

TL;DR

Abstract

Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (2)