Table of Contents
Fetching ...

FedICT: Federated Multi-task Distillation for Multi-access Edge Computing

Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Quyang Pan, Xuefeng Jiang, Bo Gao

TL;DR

FedICT tackles the challenge of federated multi-task learning in mobile edge computing by enabling personalized, heterogeneous models without relying on public data. It achieves this through Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA), which inject local data distribution priors and correct server-side distillation to keep local models well-aligned with diverse tasks while maintaining global convergence. The approach decouples local and global knowledge transfers, allowing aloof local-global representations that reduce client drift and accelerate convergence. Empirical results on CIFAR-10, CINIC-10, and Transportation Mode Detection demonstrate improved accuracy and up to 75% fewer communication rounds and sub-2% additional training communication overhead relative to strong baselines, highlighting FedICT’s practicality for MEC with heterogeneous devices.

Abstract

The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients' fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Experiments on three datasets show that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT.

FedICT: Federated Multi-task Distillation for Multi-access Edge Computing

TL;DR

FedICT tackles the challenge of federated multi-task learning in mobile edge computing by enabling personalized, heterogeneous models without relying on public data. It achieves this through Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA), which inject local data distribution priors and correct server-side distillation to keep local models well-aligned with diverse tasks while maintaining global convergence. The approach decouples local and global knowledge transfers, allowing aloof local-global representations that reduce client drift and accelerate convergence. Empirical results on CIFAR-10, CINIC-10, and Transportation Mode Detection demonstrate improved accuracy and up to 75% fewer communication rounds and sub-2% additional training communication overhead relative to strong baselines, highlighting FedICT’s practicality for MEC with heterogeneous devices.

Abstract

The growing interest in intelligent services and privacy protection for mobile devices has given rise to the widespread application of federated learning in Multi-access Edge Computing (MEC). Diverse user behaviors call for personalized services with heterogeneous Machine Learning (ML) models on different devices. Federated Multi-task Learning (FMTL) is proposed to train related but personalized ML models for different devices, whereas previous works suffer from excessive communication overhead during training and neglect the model heterogeneity among devices in MEC. Introducing knowledge distillation into FMTL can simultaneously enable efficient communication and model heterogeneity among clients, whereas existing methods rely on a public dataset, which is impractical in reality. To tackle this dilemma, Federated MultI-task Distillation for Multi-access Edge CompuTing (FedICT) is proposed. FedICT direct local-global knowledge aloof during bi-directional distillation processes between clients and the server, aiming to enable multi-task clients while alleviating client drift derived from divergent optimization directions of client-side local models. Specifically, FedICT includes Federated Prior Knowledge Distillation (FPKD) and Local Knowledge Adjustment (LKA). FPKD is proposed to reinforce the clients' fitting of local data by introducing prior knowledge of local data distributions. Moreover, LKA is proposed to correct the distillation loss of the server, making the transferred local knowledge better match the generalized representation. Experiments on three datasets show that FedICT significantly outperforms all compared benchmarks in various data heterogeneous and model architecture settings, achieving improved accuracy with less than 1.2% training communication overhead compared with FedAvg and no more than 75% training communication round compared with FedGKT.
Paper Structure (34 sections, 9 equations, 4 figures, 9 tables, 2 algorithms)

This paper contains 34 sections, 9 equations, 4 figures, 9 tables, 2 algorithms.

Figures (4)

  • Figure 1: Comparison of different FL methods in MEC. Grey circles indicate the parameter requirements for different training tasks on devices, and the blue circles indicate the trained model parameters. Each circle's size represents the scale of model parameters, and the distance between two arbitrary circles implies the degree of differences between their corresponding parameters.
  • Figure 2: Data distributions with different $\alpha$ on CIFAR-10. Each heat map represents the training/testing data distributions for all clients. Each row of heat maps represents the class distributions of a single client, where the column label gives the category. Each cell represents the sample number of corresponding classes for a given client's training/testing dataset, and the shade of the color indicates the proportion to the total.
  • Figure 3: Learning curves of local models measured by average UA on different degrees of data heterogeneity and datasets.
  • Figure 4: Learning curves on selected local models, where the horizontal coordinates indicate the number of communication rounds. Results are derived from CIFAR-10, taking $\alpha$=1.0.