Resource-Aware Heterogeneous Federated Learning using Neural Architecture Search
Sixing Yu, J. Pablo Muñoz, Ali Jannesari
TL;DR
Resource-aware Federated Learning (RaFL) tackles data non-IID and system heterogeneity by deploying resource-tailored neural architectures via a weight-sharing NAS supernet, enabled by on-device local knowledge fusion through deep mutual learning and cloud-level knowledge aggregation with optional ensemble distillation when public data is available. The approach combines on-demand NAS-derived subnetworks, a compact knowledge network for cross-client learning, and a cloud fusion strategy to integrate distributed knowledge, reducing communication overhead while enhancing inference efficiency at the edge. Empirical results on CIFAR-10/100 and FEMNIST across large-scale and sporadic FL scenarios show RaFL achieves superior learning and communication efficiency, high resource utilization, and robustness to heterogeneity compared to standard FL baselines and NAS/KD variants. These results indicate RaFL's practical potential for real-world deployments on diverse edge devices, and its framework supports transfer learning and scalable multi-architecture FL.
Abstract
Federated Learning (FL) is extensively used to train AI/ML models in distributed and privacy-preserving settings. Participant edge devices in FL systems typically contain non-independent and identically distributed (Non-IID) private data and unevenly distributed computational resources. Preserving user data privacy while optimizing AI/ML models in a heterogeneous federated network requires us to address data and system/resource heterogeneity. To address these challenges, we propose Resource-aware Federated Learning (RaFL). RaFL allocates resource-aware specialized models to edge devices using Neural Architecture Search (NAS) and allows heterogeneous model architecture deployment by knowledge extraction and fusion. Combining NAS and FL enables on-demand customized model deployment for resource-diverse edge devices. Furthermore, we propose a multi-model architecture fusion scheme allowing the aggregation of the distributed learning results. Results demonstrate RaFL's superior resource efficiency compared to SoTA.
