FedHybrid: Breaking the Memory Wall of Federated Learning via Hybrid Tensor Management
Kahou Tam, Chunlin Tian, Li Li, Haikai Zhao, ChengZhong Xu
TL;DR
FedHybrid tackles the memory wall in on-device Federated Learning by coordinating memory-aware client selection, heterogeneity-aware graph optimization, and a local training engine that uses channel-wise mix compression and recomputation. It introduces a Memory Budget Predictor and a novel MPS-based optimization framework to balance memory reduction, model accuracy, and training efficiency under dynamic device contention. Empirical results on CV and NLP tasks show up to 39.1% accuracy gains and up to 15.5× wall-clock time reductions compared with baselines, across diverse memory budgets and devices. The work demonstrates practical viability for large-scale, mobile FL and provides a pathway to deploying privacy-preserving learning in resource-constrained environments with heterogeneous hardware and background workloads.
Abstract
Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, one fundamental and prevailing challenge that hinders the deployment of FL on mobile devices is the memory limitation. This paper proposes \textit{FedHybrid}, a novel framework that effectively reduces the memory footprint during the training process while guaranteeing the model accuracy and the overall training progress. Specifically, \textit{FedHybrid} first selects the participating devices for each training round by jointly evaluating their memory budget, computing capability, and data diversity. After that, it judiciously analyzes the computational graph and generates an execution plan for each selected client in order to meet the corresponding memory budget while minimizing the training delay through employing a hybrid of recomputation and compression techniques according to the characteristic of each tensor. During the local training process, \textit{FedHybrid} carries out the execution plan with a well-designed activation compression technique to effectively achieve memory reduction with minimum accuracy loss. We conduct extensive experiments to evaluate \textit{FedHybrid} on both simulation and off-the-shelf mobile devices. The experiment results demonstrate that \textit{FedHybrid} achieves up to a 39.1\% increase in model accuracy and a 15.5$\times$ reduction in wall clock time under various memory budgets compared with the baselines.
