Enabling Weak Client Participation via On-device Knowledge Distillation in Heterogeneous Federated Learning
Jihyun Lim, Junhyuk Jo, Tuo Zhang, Sunwoo Lee
TL;DR
The paper tackles weak client participation in heterogeneous Federated Learning by showing that server-side logit-ensemble KD degrades performance under non-IID data. It proposes on-device knowledge distillation using a small auxiliary model trained on local labeled data, with strong clients transferring knowledge to a large target model via on-device KD on unlabeled data, guided by a two-step protocol. Empirical results across CIFAR-10, FEMNIST, CIFAR-100, IMDB, and Google Speech demonstrate superior accuracy over SOTA KD-based FL methods, while preserving data privacy and accommodating device heterogeneity. The work also provides a theoretical generalization bound showing that incorporating unlabeled local data via KD reduces the bound, highlighting practical benefits for scalable edge learning.
Abstract
Online Knowledge Distillation (KD) is recently highlighted to train large models in Federated Learning (FL) environments. Many existing studies adopt the logit ensemble method to perform KD on the server side. However, they often assume that unlabeled data collected at the edge is centralized on the server. Moreover, the logit ensemble method personalizes local models, which can degrade the quality of soft targets, especially when data is highly non-IID. To address these critical limitations,we propose a novel on-device KD-based heterogeneous FL method. Our approach leverages a small auxiliary model to learn from labeled local data. Subsequently, a subset of clients with strong system resources transfers knowledge to a large model through on-device KD using their unlabeled data. Our extensive experiments demonstrate that our on-device KD-based heterogeneous FL method effectively utilizes the system resources of all edge devices as well as the unlabeled data, resulting in higher accuracy compared to SOTA KD-based FL methods.
