Table of Contents
Fetching ...

Federated Learning with Flexible Architectures

Jong-Ik Park, Carlee Joe-Wong

TL;DR

This work tackles federated learning with heterogeneous clients by allowing per-client networks of varying widths and depths. It introduces layer grafting to ensure complete, uniform aggregation across architectures and a scalable weight-normalization scheme to mitigate scale disparities, with NAS guiding per-client architecture selection. Empirical results across CNNs (Pre-ResNet, MobileNetV2, EfficientNetV2) and a Transformer language model show improved global accuracy under IID and non-IID data and markedly better robustness to backdoor attacks compared with prior width/depth-flexible methods. The FedFA framework offers practical benefits for deploying FL in diverse, resource-constrained environments and points to future enhancements in scalability, security, and personalized NAS-driven architecture optimization.

Abstract

Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To address this need, this paper introduces Federated Learning with Flexible Architectures (FedFA), an FL training algorithm that allows clients to train models of different widths and depths. Each client can select a network architecture suitable for its resources, with shallower and thinner networks requiring fewer computing resources for training. Unlike prior work in this area, FedFA incorporates the layer grafting technique to align clients' local architectures with the largest network architecture in the FL system during model aggregation. Layer grafting ensures that all client contributions are uniformly integrated into the global model, thereby minimizing the risk of any individual client's data skewing the model's parameters disproportionately and introducing security benefits. Moreover, FedFA introduces the scalable aggregation method to manage scale variations in weights among different network architectures. Experimentally, FedFA outperforms previous width and depth flexible aggregation strategies. Furthermore, FedFA demonstrates increased robustness against performance degradation in backdoor attack scenarios compared to earlier strategies.

Federated Learning with Flexible Architectures

TL;DR

This work tackles federated learning with heterogeneous clients by allowing per-client networks of varying widths and depths. It introduces layer grafting to ensure complete, uniform aggregation across architectures and a scalable weight-normalization scheme to mitigate scale disparities, with NAS guiding per-client architecture selection. Empirical results across CNNs (Pre-ResNet, MobileNetV2, EfficientNetV2) and a Transformer language model show improved global accuracy under IID and non-IID data and markedly better robustness to backdoor attacks compared with prior width/depth-flexible methods. The FedFA framework offers practical benefits for deploying FL in diverse, resource-constrained environments and points to future enhancements in scalability, security, and personalized NAS-driven architecture optimization.

Abstract

Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model training. This limitation hinders the widespread adoption of FL in diverse and resource-constrained environments, such as those with client devices ranging from powerful servers to mobile devices. To address this need, this paper introduces Federated Learning with Flexible Architectures (FedFA), an FL training algorithm that allows clients to train models of different widths and depths. Each client can select a network architecture suitable for its resources, with shallower and thinner networks requiring fewer computing resources for training. Unlike prior work in this area, FedFA incorporates the layer grafting technique to align clients' local architectures with the largest network architecture in the FL system during model aggregation. Layer grafting ensures that all client contributions are uniformly integrated into the global model, thereby minimizing the risk of any individual client's data skewing the model's parameters disproportionately and introducing security benefits. Moreover, FedFA introduces the scalable aggregation method to manage scale variations in weights among different network architectures. Experimentally, FedFA outperforms previous width and depth flexible aggregation strategies. Furthermore, FedFA demonstrates increased robustness against performance degradation in backdoor attack scenarios compared to earlier strategies.
Paper Structure (52 sections, 51 equations, 9 figures, 10 tables, 3 algorithms)

This paper contains 52 sections, 51 equations, 9 figures, 10 tables, 3 algorithms.

Figures (9)

  • Figure 1: Aggregating heterogeneous networks introduces vulnerabilities due to incomplete aggregation and increased susceptibility to backdoor attacks for the global model.
  • Figure 2: The FedFA workflow: Server announces network architectures, clients select and send preferences, server configures the global model, clients perform local training, updates are sent to the server, where they are grafted, normalized, and aggregated, iterating until convergence criteria are achieved. Each step in the workflow is mapped to specific lines in Algorithm \ref{['algorithm: FedFA']} and depicted in the corresponding steps of the figure.
  • Figure 3: Visualizations of FedFA's robustness against backdoor attacks in different FL settings across the CIFAR-10 with Pre-ResNet, CIFAR-100 with MobileNetV2, and Fashion MNIST with EfficientNetV2 datasets. The blue dotted lines are positioned below the lowest accuracy of FedFA but above the next highest accuracy among FlexiFed, HeteroFL, and NeFL. They underscore the robustness of FedFA when the attack intensity $\lambda=20$ with 20% malicious clients.
  • Figure 4: a) A residual block in a CNN, including convolutional and batch normalization layers. b) A skip connection network with residual blocks. c1) Residual blocks from Section 2 of the network. c2) The unfolded network, highlighting how skip connections enable an ensemble-like system.
  • Figure 5: Randomness in sequences of filters and weight maps in convolutional layers.
  • ...and 4 more figures