Table of Contents
Fetching ...

NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous Clients

Honggu Kang, Seohyeon Cha, Jinwoo Shin, Jongmyeong Lee, Joonhyuk Kang

TL;DR

This paper tackles the challenge of system heterogeneity in federated learning by enabling resource-constrained clients to participate via nested submodels scaled in both depth and width. It introduces NeFL, a framework that decouples parameters into consistent and inconsistent groups and uses a dual-aggregation scheme (NeFedAvg for consistent parameters and FedAvg for inconsistent ones) to fuse diverse submodel updates. The key contributions are (1) an ODE-inspired, depthwise/widthwise model scaling approach, (2) the concept of inconsistent parameters (e.g., learnable step sizes and BN layers) with a corresponding averaging method, and (3) extensive experiments showing improvements in worst-case submodels and compatibility with pre-trained models and ViTs. The results demonstrate that NeFL enables broader participation of heterogeneous clients, achieves better worst-case performance, and integrates with modern FL practices, offering a practical path for scalable, privacy-preserving learning on diverse devices $\left(\text{e.g., CIFAR-100 worst-case improvement }=$ $7.63\%\right)$.

Abstract

Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance. To mitigate the impact of stragglers, system heterogeneity, including heterogeneous computing and network bandwidth, has been addressed. While previous studies have addressed system heterogeneity by splitting models into submodels, they offer limited flexibility in model architecture design, without considering potential inconsistencies arising from training multiple submodel architectures. We propose nested federated learning (NeFL), a generalized framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling. To address the inconsistency arising from training multiple submodel architectures, NeFL decouples a subset of parameters from those being trained for each submodel. An averaging method is proposed to handle these decoupled parameters during aggregation. NeFL enables resource-constrained devices to effectively participate in the FL pipeline, facilitating larger datasets for model training. Experiments demonstrate that NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches (7.63% improvement on CIFAR-100). Furthermore, NeFL aligns with recent advances in FL, such as leveraging pre-trained models and accounting for statistical heterogeneity. Our code is available online.

NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous Clients

TL;DR

This paper tackles the challenge of system heterogeneity in federated learning by enabling resource-constrained clients to participate via nested submodels scaled in both depth and width. It introduces NeFL, a framework that decouples parameters into consistent and inconsistent groups and uses a dual-aggregation scheme (NeFedAvg for consistent parameters and FedAvg for inconsistent ones) to fuse diverse submodel updates. The key contributions are (1) an ODE-inspired, depthwise/widthwise model scaling approach, (2) the concept of inconsistent parameters (e.g., learnable step sizes and BN layers) with a corresponding averaging method, and (3) extensive experiments showing improvements in worst-case submodels and compatibility with pre-trained models and ViTs. The results demonstrate that NeFL enables broader participation of heterogeneous clients, achieves better worst-case performance, and integrates with modern FL practices, offering a practical path for scalable, privacy-preserving learning on diverse devices .

Abstract

Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance. To mitigate the impact of stragglers, system heterogeneity, including heterogeneous computing and network bandwidth, has been addressed. While previous studies have addressed system heterogeneity by splitting models into submodels, they offer limited flexibility in model architecture design, without considering potential inconsistencies arising from training multiple submodel architectures. We propose nested federated learning (NeFL), a generalized framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling. To address the inconsistency arising from training multiple submodel architectures, NeFL decouples a subset of parameters from those being trained for each submodel. An averaging method is proposed to handle these decoupled parameters during aggregation. NeFL enables resource-constrained devices to effectively participate in the FL pipeline, facilitating larger datasets for model training. Experiments demonstrate that NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches (7.63% improvement on CIFAR-100). Furthermore, NeFL aligns with recent advances in FL, such as leveraging pre-trained models and accounting for statistical heterogeneity. Our code is available online.
Paper Structure (27 sections, 3 equations, 7 figures, 7 tables, 2 algorithms)

This paper contains 27 sections, 3 equations, 7 figures, 7 tables, 2 algorithms.

Figures (7)

  • Figure 1: Toy example of ODE solver showing the effect of number of steps and step sizes
  • Figure 2: Example of output computation when applying widthwise/depthwise model scaling inspired by ODE solver
  • Figure 3: Scaling method in both depth and/or width dimensions inspired by ODEs
  • Figure 4: NeFL framework. (1) Clients select submodels that are scaled in both width and/or depth dimensions. (2) The NeFL server aggregates weights of submodels by a proposed parameter averaging algorithm that addresses consistent and inconsistent parameters separately. (3) Clients can choose one of the submodels based on their varying capabilities in a dynamic environment.
  • Figure 5: (a) Average L1 norm of trained submodel weights and (b) step sizes of trained submodels
  • ...and 2 more figures