Table of Contents
Fetching ...

Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection

Dixi Yao

TL;DR

The paper tackles federated learning under heterogeneous edge resources by enabling dynamic, per-round model selection across clients. It introduces a global multi-architecture model and a resource-constrained search that yields client-specific subnetworks $w_i = w \otimes V$ drawn from a space $S$ of layer-wise and depth configurations, with aggregation performed at the parameter level. A novel federated in-place distillation stage on the server uses generated data $\mathcal{X}_{KD} \sim \mathcal{N}(0,1)$ to refine the global model without exposing client data, achieving convergence guarantees of $\mathcal{O}(1/T)$ and improving accuracy by $2.43\%$ to $15.81\%$ while boosting memory and bandwidth utilization by $5\%$ to $40\%$ with negligible server overhead. The approach is implemented on Platoplato, validated across CIFAR10, FEMNIST, and Shakespeare under both i.i.d. and non-i.i.d. settings, and shown to be robust to resource fluctuations via logs collected from real devices. The work offers a practical path to more efficient and fair FL in real-world mobile environments, with broad applicability to other resource constraints and compatibility with existing privacy-preserving FL methods.

Abstract

Federated learning is a distributed learning paradigm in which multiple mobile clients train a global model while keeping data local. These mobile clients can have various available memory and network bandwidth. However, to achieve the best global model performance, how we can utilize available memory and network bandwidth to the maximum remains an open challenge. In this paper, we propose to assign each client a subset of the global model, having different layers and channels on each layer. To realize that, we design a constrained model search process with early stop to improve efficiency of finding the models from such a very large space; and a data-free knowledge distillation mechanism to improve the global model performance when aggregating models of such different structures. For fair and reproducible comparison between different solutions, we develop a new system, which can directly allocate different memory and bandwidth to each client according to memory and bandwidth logs collected on mobile devices. The evaluation shows that our solution can have accuracy increase ranging from 2.43\% to 15.81\% and provide 5\% to 40\% more memory and bandwidth utilization with negligible extra running time, comparing to existing state-of-the-art system-heterogeneous federated learning methods under different available memory and bandwidth, non-i.i.d.~datasets, image and text tasks.

Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection

TL;DR

The paper tackles federated learning under heterogeneous edge resources by enabling dynamic, per-round model selection across clients. It introduces a global multi-architecture model and a resource-constrained search that yields client-specific subnetworks drawn from a space of layer-wise and depth configurations, with aggregation performed at the parameter level. A novel federated in-place distillation stage on the server uses generated data to refine the global model without exposing client data, achieving convergence guarantees of and improving accuracy by to while boosting memory and bandwidth utilization by to with negligible server overhead. The approach is implemented on Platoplato, validated across CIFAR10, FEMNIST, and Shakespeare under both i.i.d. and non-i.i.d. settings, and shown to be robust to resource fluctuations via logs collected from real devices. The work offers a practical path to more efficient and fair FL in real-world mobile environments, with broad applicability to other resource constraints and compatibility with existing privacy-preserving FL methods.

Abstract

Federated learning is a distributed learning paradigm in which multiple mobile clients train a global model while keeping data local. These mobile clients can have various available memory and network bandwidth. However, to achieve the best global model performance, how we can utilize available memory and network bandwidth to the maximum remains an open challenge. In this paper, we propose to assign each client a subset of the global model, having different layers and channels on each layer. To realize that, we design a constrained model search process with early stop to improve efficiency of finding the models from such a very large space; and a data-free knowledge distillation mechanism to improve the global model performance when aggregating models of such different structures. For fair and reproducible comparison between different solutions, we develop a new system, which can directly allocate different memory and bandwidth to each client according to memory and bandwidth logs collected on mobile devices. The evaluation shows that our solution can have accuracy increase ranging from 2.43\% to 15.81\% and provide 5\% to 40\% more memory and bandwidth utilization with negligible extra running time, comparing to existing state-of-the-art system-heterogeneous federated learning methods under different available memory and bandwidth, non-i.i.d.~datasets, image and text tasks.
Paper Structure (42 sections, 1 theorem, 5 equations, 15 figures, 3 tables, 2 algorithms)

This paper contains 42 sections, 1 theorem, 5 equations, 15 figures, 3 tables, 2 algorithms.

Key Result

Theorem 3.3

If the convergence rate of FedAvg is $\mathcal{O}(\frac{1}{T})$, the convergence rates of model search based and model-heterogeneous based system-heterogeneous federated learning methods are $\mathcal{O}(\frac{1}{T})$. The proof is in appendix:convergence.

Figures (15)

  • Figure 1: The changing of available memory and transmission rate when running the applications.
  • Figure 2: The overall workflow of our method to train the global model in system-heterogeneous FL.
  • Figure 3: The illustration of our aggregation method of parameters from different clients. The example shows one convolution layer with different kernel sizes on the clients.
  • Figure 4: The process of in-place distillation on the server. Notably these $n$ new architectures are sampled from $\mathcal{P}$.
  • Figure 5: System structure.
  • ...and 10 more figures

Theorems & Definitions (4)

  • Definition 3.1
  • Definition 3.2
  • Theorem 3.3
  • proof