Federated Learning with Flexible Architectures
Jong-Ik Park, Carlee Joe-Wong
TL;DR
This work tackles federated learning with heterogeneous clients by allowing per-client networks of varying widths and depths. It introduces layer grafting to ensure complete, uniform aggregation across architectures and a scalable weight-normalization scheme to mitigate scale disparities, with NAS guiding per-client architecture selection. Empirical results across CNNs (Pre-ResNet, MobileNetV2, EfficientNetV2) and a Transformer language model show improved global accuracy under IID and non-IID data and markedly better robustness to backdoor attacks compared with prior width/depth-flexible methods. The FedFA framework offers practical benefits for deploying FL in diverse, resource-constrained environments and points to future enhancements in scalability, security, and personalized NAS-driven architecture optimization.
Abstract
Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To address this need, this paper introduces Federated Learning with Flexible Architectures (FedFA), an FL training algorithm that allows clients to train models of different widths and depths. Each client can select a network architecture suitable for its resources, with shallower and thinner networks requiring fewer computing resources for training. Unlike prior work in this area, FedFA incorporates the layer grafting technique to align clients' local architectures with the largest network architecture in the FL system during model aggregation. Layer grafting ensures that all client contributions are uniformly integrated into the global model, thereby minimizing the risk of any individual client's data skewing the model's parameters disproportionately and introducing security benefits. Moreover, FedFA introduces the scalable aggregation method to manage scale variations in weights among different network architectures. Experimentally, FedFA outperforms previous width and depth flexible aggregation strategies. Furthermore, FedFA demonstrates increased robustness against performance degradation in backdoor attack scenarios compared to earlier strategies.
