FedICT: Federated Multi-task Distillation for Multi-access Edge Computing
Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Quyang Pan, Xuefeng Jiang, Bo Gao
TL;DR
FedICT tackles the challenge of federated multi-task learning in mobile edge computing by enabling personalized, heterogeneous models without relying on public data. It achieves this through Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA), which inject local data distribution priors and correct server-side distillation to keep local models well-aligned with diverse tasks while maintaining global convergence. The approach decouples local and global knowledge transfers, allowing aloof local-global representations that reduce client drift and accelerate convergence. Empirical results on CIFAR-10, CINIC-10, and Transportation Mode Detection demonstrate improved accuracy and up to 75% fewer communication rounds and sub-2% additional training communication overhead relative to strong baselines, highlighting FedICT’s practicality for MEC with heterogeneous devices.
Abstract
The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients' fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Experiments on three datasets show that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT.
