Enhancing Efficiency in Multidevice Federated Learning through Data Selection

Fan Mo; Mohammad Malekzadeh; Soumyajit Chatterjee; Fahim Kawsar; Akhil Mathur

Enhancing Efficiency in Multidevice Federated Learning through Data Selection

Fan Mo, Mohammad Malekzadeh, Soumyajit Chatterjee, Fahim Kawsar, Akhil Mathur

TL;DR

A federated learning framework to incorporate on-device data selection at the edge, which allows partition-based training of a deep neural nets through collaboration between constrained and resourceful devices within the multidevice ecosystem of the same user.

Abstract

Ubiquitous wearable and mobile devices provide access to a diverse set of data. However, the mobility demand for our devices naturally imposes constraints on their computational and communication capabilities. A solution is to locally learn knowledge from data captured by ubiquitous devices, rather than to store and transmit the data in its original form. In this paper, we develop a federated learning framework, called Centaur, to incorporate on-device data selection at the edge, which allows partition-based training of a deep neural nets through collaboration between constrained and resourceful devices within the multidevice ecosystem of the same user. We benchmark on five neural net architecture and six datasets that include image data and wearable sensor time series. On average, Centaur achieves ~19% higher classification accuracy and ~58% lower federated training latency, compared to the baseline. We also evaluate Centaur when dealing with imbalanced non-iid data, client participation heterogeneity, and different mobility patterns. To encourage further research in this area, we release our code at https://github.com/nokia-bell-labs/data-centric-federated-learning

Enhancing Efficiency in Multidevice Federated Learning through Data Selection

TL;DR

Abstract

Paper Structure (18 sections, 4 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 4 equations, 11 figures, 1 table, 1 algorithm.

Introduction
Related Work
A Motivating Study
Method
Model Initialization
Data Selection
Loss-based Selection.
Gradient-based Selection.
Partition-based Training and Aggregation
Experimental Setup
Evaluation Results
Metrics
Model Accuracy
Efficiency
Data and Participation Heterogeneity
...and 3 more sections

Figures (11)

Figure 1: The latency of federated training versus classification accuracy on test dataset. We compare Centaur with standard federated training that only trains the classifier on ubiquitous constrained devices (UCDs) without data selection and partition-based training. (Top plot) Four models were tested on image data from CIFAR-10, and (bottom plot) a model was tested on three datasets of wearable sensor time-series.
Figure 2: On four RaspberryPis, we measure memory consumption of running FL for when (i) only the classifier is trained, (ii) the entire model is trained, and (iii) the entire model runs inference only.
Figure 3: The overview of Centaur, including model initialization (§ \ref{['subsec:model_init']}), data selection (§ \ref{['subsec:data_selection']}), and partition-based training and aggregation (§ \ref{['subsec:fl_training']}). We explain the set up in § \ref{['subsec:framework_overview']} and elaborate the details of Steps ① to ⑧.
Figure 4: Test accuracy when using different values for parameters $\alpha$, $\beta$, $\gamma$ in data selection. The $\blacktriangledown$ point in the left figure is the UCD training, and the $\blacktriangle$ point in the right figure is the AP training. It is found that with data selection, Centaur can always achieve higher accuracy than both UCD training and AP training.
Figure 5: Accuracy of the best classifier for four different encoders, trained on CIFAR10, CIFAR100, and EMNIST. Dash lines () above depict the upper-bound accuracy when UCD devices have no resource and connectivity constraints.
...and 6 more figures

Enhancing Efficiency in Multidevice Federated Learning through Data Selection

TL;DR

Abstract

Enhancing Efficiency in Multidevice Federated Learning through Data Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)