Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks
S. Kawa Atapour, S. Jamal SeyedMohammadi, S. Mohammad Sheikholeslami, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi
TL;DR
The paper tackles the challenge of training deep models in resource-constrained IoT edge networks by moving the foundation model to the server and distilling its knowledge into a learnable prompt generator, enabling efficient, data-free federated distillation at the edge. It introduces FedD2P, a framework that uses per-class knowledge from heterogeneous devices, a linguistic-assisted prompt generator, and prompt-tuning to adapt a vision-language FM without local deployment; the FM remains frozen, while the prompt generator and lightweight local models enable effective downstream task performance. Key contributions include a data-free mutual KD workflow with per-class knowledge exchange, the design of the LA prompt generator leveraging linguistic descriptions and self-attention, and comprehensive simulations showing competitive accuracy and improved efficiency across five datasets (CIFAR, SVHN, OxfordPets, EuroSAT, DTD) under varying heterogeneity and temperature settings. Overall, FedD2P demonstrates that server-hosted FMs, coupled with per-class KD and linguistically guided prompts, can significantly reduce edge resource requirements while maintaining strong generalization in federated edge learning scenarios, offering practical impact for privacy-preserving, communication-efficient AI on IoT networks.
Abstract
Recently pre-trained Foundation Models (FMs) have been combined with Federated Learning (FL) to improve training of downstream tasks while preserving privacy. However, deploying FMs over edge networks with resource-constrained Internet of Things (IoT) devices is under-explored. This paper proposes a novel framework, namely, Federated Distilling knowledge to Prompt (FedD2P), for leveraging the robust representation abilities of a vision-language FM without deploying it locally on edge devices. This framework distills the aggregated knowledge of IoT devices to a prompt generator to efficiently adapt the frozen FM for downstream tasks. To eliminate the dependency on a public dataset, our framework leverages perclass local knowledge from IoT devices and linguistic descriptions of classes to train the prompt generator. Our experiments on diverse image classification datasets CIFAR, OxfordPets, SVHN, EuroSAT, and DTD show that FedD2P outperforms the baselines in terms of model performance.
