Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection
Dixi Yao
TL;DR
The paper tackles federated learning under heterogeneous edge resources by enabling dynamic, per-round model selection across clients. It introduces a global multi-architecture model and a resource-constrained search that yields client-specific subnetworks $w_i = w \otimes V$ drawn from a space $S$ of layer-wise and depth configurations, with aggregation performed at the parameter level. A novel federated in-place distillation stage on the server uses generated data $\mathcal{X}_{KD} \sim \mathcal{N}(0,1)$ to refine the global model without exposing client data, achieving convergence guarantees of $\mathcal{O}(1/T)$ and improving accuracy by $2.43\%$ to $15.81\%$ while boosting memory and bandwidth utilization by $5\%$ to $40\%$ with negligible server overhead. The approach is implemented on Platoplato, validated across CIFAR10, FEMNIST, and Shakespeare under both i.i.d. and non-i.i.d. settings, and shown to be robust to resource fluctuations via logs collected from real devices. The work offers a practical path to more efficient and fair FL in real-world mobile environments, with broad applicability to other resource constraints and compatibility with existing privacy-preserving FL methods.
Abstract
Federated learning is a distributed learning paradigm in which multiple mobile clients train a global model while keeping data local. These mobile clients can have various available memory and network bandwidth. However, to achieve the best global model performance, how we can utilize available memory and network bandwidth to the maximum remains an open challenge. In this paper, we propose to assign each client a subset of the global model, having different layers and channels on each layer. To realize that, we design a constrained model search process with early stop to improve efficiency of finding the models from such a very large space; and a data-free knowledge distillation mechanism to improve the global model performance when aggregating models of such different structures. For fair and reproducible comparison between different solutions, we develop a new system, which can directly allocate different memory and bandwidth to each client according to memory and bandwidth logs collected on mobile devices. The evaluation shows that our solution can have accuracy increase ranging from 2.43\% to 15.81\% and provide 5\% to 40\% more memory and bandwidth utilization with negligible extra running time, comparing to existing state-of-the-art system-heterogeneous federated learning methods under different available memory and bandwidth, non-i.i.d.~datasets, image and text tasks.
