EMO: Edge Model Overlays to Scale Model Size in Federated Learning
Di Wu, Weibo He, Wanglei Feng, Zhenyu Wen, Bin Qian, Blesson Varghese
TL;DR
This paper tackles the memory and compute bottlenecks of Federated Learning on edge devices by proposing EMO, a system of Edge Model Overlay(s) that enables large models to be trained in FL without the bottlenecks of Split Federated Learning. EMO introduces Augmented Federated Learning (AFL), which constructs a larger ensemble model by connecting edge-trained overlays to the on-device FL model, while decoupling AFL from the FL workflow via a hierarchical Activation Replay Cache and a Convergence-aware Communication Controller. The approach yields up to 17.77% higher accuracy than FL alone and dramatically reduces communication and training time compared to SFL (up to 7.17x and 6.9x, respectively) on CIFAR-10/100 with non-IID data. Practically, EMO enables scalable, privacy-conscious, edge-assisted training of large models with reduced WAN traffic and improved throughput, making it attractive for real-world edge computing deployments. Future work will address privacy enhancements for the activation caches and broader overlay configurations.
Abstract
Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlenecks and high communication costs. We propose EMO as a solution to enable the training of large models in FL while mitigating the challenges of SFL. EMO introduces Edge Model Overlay(s) between the device and server, enabling the creation of a larger ensemble model without modifying the FL workflow. The key innovation in EMO is Augmented Federated Learning (AFL), which builds an ensemble model by connecting the original (smaller) FL model with model(s) trained in the overlay(s) to facilitate horizontal or vertical scaling. This is accomplished through three key modules: a hierarchical activation replay cache to decouple AFL from FL, a convergence-aware communication controller to optimize communication overhead, and an ensemble inference module. Evaluations on a real-world prototype show that EMO improves accuracy by up to 17.77% compared to FL, and reduces communication costs by up to 7.17x and decreases training time by up to 6.9x compared to SFL.
