FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems

Herbert Woisetschläger; Alexander Erben; Ruben Mayer; Shiqiang Wang; Hans-Arno Jacobsen

FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems

Herbert Woisetschläger, Alexander Erben, Ruben Mayer, Shiqiang Wang, Hans-Arno Jacobsen

TL;DR

This paper proposes FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities, and focuses on computational and communication bottlenecks, client behavior, and data security implications.

Abstract

Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge. Heterogeneous hardware, unreliable client devices, and energy constraints often characterize edge computing systems. In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities. We focus on computational and communication bottlenecks, client behavior, and data security implications. Our experiments with models varying from 14K to 80M trainable parameters are carried out on dedicated hardware with emulated network characteristics and client behavior. We find that state-of-the-art embedded hardware has significant memory bottlenecks, leading to 4x longer processing times than on modern data center GPUs.

FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems

TL;DR

Abstract

Paper Structure (18 sections, 6 equations, 4 figures, 7 tables)

This paper contains 18 sections, 6 equations, 4 figures, 7 tables.

Introduction
Requirement Analysis
FLEdge: Benchmarking Framework
Protocol
Practical Assumptions & Configuration
Experimental Setup
Testbed
Software Stack
FL Workloads
Results
Client behavior
Differential Privacy
Communication Efficiency
Energy Efficiency
Hardware Heterogeneity
...and 3 more sections

Figures (4)

Figure 1: Federated Learning protocol for one training round from a system perspective with 1 - 4 indicating the focus areas for our work and the benchmark subjects for FLEdge. Our system uses Flower as the underlying FL framework and implements each component in a modular and extensible manner.
Figure 2: The non-IID subsets for our clients are sampled from a Dirichlet distribution ($\alpha= 1$).
Figure 3: Energy efficiency measured across datasets and DL models (# parameters). Results are measured over one epoch of training. The Orins train with a minibatch size of 256 across all models. Higher values are better.
Figure 4: Training times for different device types and model sizes over one minibatch. We further scale the minibatch size for the FLAN-T5 transformer model to showcase edge-specific bottlenecks. Next to the model name, we report the model parameters. Lower is better.

FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems

TL;DR

Abstract

FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)