Toward efficient resource utilization at edge nodes in federated learning

Sadi Alawadi; Addi Ait-Mlouk; Salman Toor; Andreas Hellander

Toward efficient resource utilization at edge nodes in federated learning

Sadi Alawadi, Addi Ait-Mlouk, Salman Toor, Andreas Hellander

TL;DR

The paper tackles resource constraints of edge devices in federated learning for deep models by introducing a transfer-learning–inspired partial-layer training strategy. Each FL client randomly selects and trains a subset of $N_l$ layers while freezing the rest, reducing local compute and network transfer, implemented with FedAvg in the FEDn framework. Empirical results on CIFAR-10 (VGG16), CASA HAR (LSTM), and IMDB (NLP) show that training a portion of layers can maintain near-baseline global accuracy while reducing data transfer by up to 75% (for 25% of layers) and 53% (for 50% of layers), with improvements in convergence behavior as the number of participating clients grows. The study demonstrates feasibility on resource-constrained edge devices, including Jetson Nano, and provides insights into resource–accuracy trade-offs and scalability in FL. The work offers a practical approach to enabling larger models to operate in edge FL scenarios without sacrificing privacy or incurring prohibitive communication costs, and outlines avenues for dynamic layer selection and layer-importance analysis as future work.

Abstract

Federated learning (FL) enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. This is accomplished by devices computing local, private model updates that are then aggregated by a server. However, computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. Edge nodes tend to have limited hardware resources (RAM, CPU), and the network bandwidth and reliability at the edge is a concern for scaling federated fleet applications. In this paper, we propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices, as well as the load on the server and network in each global training round. For each local model update, we randomly select layers to train, freezing the remaining part of the model. In doing so, we can reduce both server load and communication costs per round by excluding all untrained layer weights from being transferred to the server. The goal of this study is to empirically explore the potential trade-off between resource utilization on devices and global model convergence under the proposed strategy. We implement the approach using the federated learning framework FEDn. A number of experiments were carried out over different datasets (CIFAR-10, CASA, and IMDB), performing different tasks using different deep-learning model architectures. Our results show that training the model partially can accelerate the training process, efficiently utilizes resources on-device, and reduce the data transmission by around 75% and 53% when we train 25%, and 50% of the model layers, respectively, without harming the resulting global model accuracy.

Toward efficient resource utilization at edge nodes in federated learning

TL;DR

layers while freezing the rest, reducing local compute and network transfer, implemented with FedAvg in the FEDn framework. Empirical results on CIFAR-10 (VGG16), CASA HAR (LSTM), and IMDB (NLP) show that training a portion of layers can maintain near-baseline global accuracy while reducing data transfer by up to 75% (for 25% of layers) and 53% (for 50% of layers), with improvements in convergence behavior as the number of participating clients grows. The study demonstrates feasibility on resource-constrained edge devices, including Jetson Nano, and provides insights into resource–accuracy trade-offs and scalability in FL. The work offers a practical approach to enabling larger models to operate in edge FL scenarios without sacrificing privacy or incurring prohibitive communication costs, and outlines avenues for dynamic layer selection and layer-importance analysis as future work.

Abstract

Paper Structure (19 sections, 3 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 19 sections, 3 equations, 9 figures, 6 tables, 2 algorithms.

Introduction
Background and Related Work
Federated Learning
Transfer Learning and Model Fine-tuning
Training Parallelization Techniques
Related Work
Proposed Approach
Results and Discussion
Experimental Settings
Results and discussion
Model performance
Trainable layer distribution
The impact of scaling the number of clients (edge nodes) on the model accuracy
Training time
Transferred data size and number of trainable parameters
...and 4 more sections

Figures (9)

Figure 1: The abstract diagram depicts the proposed approach for training the ML model with four clients in the FL context. Where each client independently selects 50% of the entire model layers randomly during every training round.
Figure 2: VGG16 model accuracy for CIFAR-10 dataset using different numbers of trainable layers.
Figure 3: Evaluating two different DL architectures to perform distinct tasks in terms of accuracy (a) human activity recognition task using CASA dataset. (b) Sentiment analysis task using IMDB dataset
Figure 4: VGG16 layers distribution across 10 clients during 100 training rounds using different parts of the model
Figure 5: Comparing the impact of reducing the number of trainable layers to half ($7$ layers) while scaling the number of clients ($20$ layers) to double the number of clients ($10$ layers) used to train the whole model ($14$ layers). This setting change was carried out while maintaining the same amount of CIFAR-10 data for both scenarios. The objective was to evaluate how these modifications influenced the global model's accuracy.
...and 4 more figures

Toward efficient resource utilization at edge nodes in federated learning

TL;DR

Abstract

Toward efficient resource utilization at edge nodes in federated learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)