NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous Clients
Honggu Kang, Seohyeon Cha, Jinwoo Shin, Jongmyeong Lee, Joonhyuk Kang
TL;DR
This paper tackles the challenge of system heterogeneity in federated learning by enabling resource-constrained clients to participate via nested submodels scaled in both depth and width. It introduces NeFL, a framework that decouples parameters into consistent and inconsistent groups and uses a dual-aggregation scheme (NeFedAvg for consistent parameters and FedAvg for inconsistent ones) to fuse diverse submodel updates. The key contributions are (1) an ODE-inspired, depthwise/widthwise model scaling approach, (2) the concept of inconsistent parameters (e.g., learnable step sizes and BN layers) with a corresponding averaging method, and (3) extensive experiments showing improvements in worst-case submodels and compatibility with pre-trained models and ViTs. The results demonstrate that NeFL enables broader participation of heterogeneous clients, achieves better worst-case performance, and integrates with modern FL practices, offering a practical path for scalable, privacy-preserving learning on diverse devices $\left(\text{e.g., CIFAR-100 worst-case improvement }=$ $7.63\%\right)$.
Abstract
Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance. To mitigate the impact of stragglers, system heterogeneity, including heterogeneous computing and network bandwidth, has been addressed. While previous studies have addressed system heterogeneity by splitting models into submodels, they offer limited flexibility in model architecture design, without considering potential inconsistencies arising from training multiple submodel architectures. We propose nested federated learning (NeFL), a generalized framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling. To address the inconsistency arising from training multiple submodel architectures, NeFL decouples a subset of parameters from those being trained for each submodel. An averaging method is proposed to handle these decoupled parameters during aggregation. NeFL enables resource-constrained devices to effectively participate in the FL pipeline, facilitating larger datasets for model training. Experiments demonstrate that NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches (7.63% improvement on CIFAR-100). Furthermore, NeFL aligns with recent advances in FL, such as leveraging pre-trained models and accounting for statistical heterogeneity. Our code is available online.
