Table of Contents
Fetching ...

Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data

Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, Seong-Lyun Kim

TL;DR

This work tackles the dual challenges of communication bottlenecks and non-IID private data in on-device ML by introducing Federated Distillation (FD) and Federated Augmentation (FAug). FD replaces heavy model-parameter exchanges with lightweight, per-label logit exchanges, using global-average label-wise logits $\\hat{F}_{k,\ell}^{(i)}$ assembled from local averages $\bar{F}_{k,\ell}^{(i)}$, and employs online co-distillation with a cross-entropy distillation regularizer. FAug mitigates non-IID effects by training a server-side CondGAN to synthesize missing samples, with privacy leakage carefully quantified via device-server and inter-device metrics and controlled by seed and redundant-label sharing. Empirically, FD+FAug achieves ~26x reduction in communication while delivering 95–98% test accuracy on non-IID MNIST, outperforming FL under the same constraints and illustrating a viable path for scalable, privacy-aware, on-device learning with large models.

Abstract

On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.

Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data

TL;DR

This work tackles the dual challenges of communication bottlenecks and non-IID private data in on-device ML by introducing Federated Distillation (FD) and Federated Augmentation (FAug). FD replaces heavy model-parameter exchanges with lightweight, per-label logit exchanges, using global-average label-wise logits assembled from local averages , and employs online co-distillation with a cross-entropy distillation regularizer. FAug mitigates non-IID effects by training a server-side CondGAN to synthesize missing samples, with privacy leakage carefully quantified via device-server and inter-device metrics and controlled by seed and redundant-label sharing. Empirically, FD+FAug achieves ~26x reduction in communication while delivering 95–98% test accuracy on non-IID MNIST, outperforming FL under the same constraints and illustrating a viable path for scalable, privacy-aware, on-device learning with large models.

Abstract

On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.

Paper Structure

This paper contains 5 sections, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Schematic overview of federated distillation (FD) and federated augmentation (FAug).
  • Figure 2: Test accuracy and privacy leakage (PL) under a non-IID MNIST dataset: (a) accuracy per label under FL or FD with FAug, compared to the non-IID standalone case; (b) inter-device PL and accuracy under FD with FAug; and (c) device-server PL under FAug for different numbers of target labels.