Table of Contents
Fetching ...

EMO: Edge Model Overlays to Scale Model Size in Federated Learning

Di Wu, Weibo He, Wanglei Feng, Zhenyu Wen, Bin Qian, Blesson Varghese

TL;DR

This paper tackles the memory and compute bottlenecks of Federated Learning on edge devices by proposing EMO, a system of Edge Model Overlay(s) that enables large models to be trained in FL without the bottlenecks of Split Federated Learning. EMO introduces Augmented Federated Learning (AFL), which constructs a larger ensemble model by connecting edge-trained overlays to the on-device FL model, while decoupling AFL from the FL workflow via a hierarchical Activation Replay Cache and a Convergence-aware Communication Controller. The approach yields up to 17.77% higher accuracy than FL alone and dramatically reduces communication and training time compared to SFL (up to 7.17x and 6.9x, respectively) on CIFAR-10/100 with non-IID data. Practically, EMO enables scalable, privacy-conscious, edge-assisted training of large models with reduced WAN traffic and improved throughput, making it attractive for real-world edge computing deployments. Future work will address privacy enhancements for the activation caches and broader overlay configurations.

Abstract

Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlenecks and high communication costs. We propose EMO as a solution to enable the training of large models in FL while mitigating the challenges of SFL. EMO introduces Edge Model Overlay(s) between the device and server, enabling the creation of a larger ensemble model without modifying the FL workflow. The key innovation in EMO is Augmented Federated Learning (AFL), which builds an ensemble model by connecting the original (smaller) FL model with model(s) trained in the overlay(s) to facilitate horizontal or vertical scaling. This is accomplished through three key modules: a hierarchical activation replay cache to decouple AFL from FL, a convergence-aware communication controller to optimize communication overhead, and an ensemble inference module. Evaluations on a real-world prototype show that EMO improves accuracy by up to 17.77% compared to FL, and reduces communication costs by up to 7.17x and decreases training time by up to 6.9x compared to SFL.

EMO: Edge Model Overlays to Scale Model Size in Federated Learning

TL;DR

This paper tackles the memory and compute bottlenecks of Federated Learning on edge devices by proposing EMO, a system of Edge Model Overlay(s) that enables large models to be trained in FL without the bottlenecks of Split Federated Learning. EMO introduces Augmented Federated Learning (AFL), which constructs a larger ensemble model by connecting edge-trained overlays to the on-device FL model, while decoupling AFL from the FL workflow via a hierarchical Activation Replay Cache and a Convergence-aware Communication Controller. The approach yields up to 17.77% higher accuracy than FL alone and dramatically reduces communication and training time compared to SFL (up to 7.17x and 6.9x, respectively) on CIFAR-10/100 with non-IID data. Practically, EMO enables scalable, privacy-conscious, edge-assisted training of large models with reduced WAN traffic and improved throughput, making it attractive for real-world edge computing deployments. Future work will address privacy enhancements for the activation caches and broader overlay configurations.

Abstract

Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlenecks and high communication costs. We propose EMO as a solution to enable the training of large models in FL while mitigating the challenges of SFL. EMO introduces Edge Model Overlay(s) between the device and server, enabling the creation of a larger ensemble model without modifying the FL workflow. The key innovation in EMO is Augmented Federated Learning (AFL), which builds an ensemble model by connecting the original (smaller) FL model with model(s) trained in the overlay(s) to facilitate horizontal or vertical scaling. This is accomplished through three key modules: a hierarchical activation replay cache to decouple AFL from FL, a convergence-aware communication controller to optimize communication overhead, and an ensemble inference module. Evaluations on a real-world prototype show that EMO improves accuracy by up to 17.77% compared to FL, and reduces communication costs by up to 7.17x and decreases training time by up to 6.9x compared to SFL.

Paper Structure

This paper contains 15 sections, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Computational dependencies in SFL and EMO.
  • Figure 2: The EMO architecture.
  • Figure 3: The activation replay cache, convergence-aware communication controller and ensemble inference modules in EMO.
  • Figure 4: Test accuracy curves of EMO and baselines for CIFAR-10 and CIFAR-100 datasets.
  • Figure 5: Average training time per round for different methods, including its communication and total time.