A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

Madison Threadgill; Andreas Gerstlauer

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

Madison Threadgill, Andreas Gerstlauer

TL;DR

The increasing size and complexity of deep learning models necessitate distributed learning across cloud, mobile, and edge environments. The survey maps data and model parallelism across dimensions $d$, $l$, and $m$, and reviews layer-level partitioning techniques for FC, CNN, and RNN/LSTM architectures, including MoDNN, layer fusion, channel/weight partitioning, and gate-/weight-based partitioning, as well as asynchronous and privacy-preserving paradigms. It categorizes partitioning schemes by training vs inference and cloud vs edge contexts, summarizes trade-offs among computation, communication, and memory, and identifies key challenges and future directions. The work provides a structured reference to guide the design of scalable, privacy-aware distributed DL systems in heterogeneous environments.

Abstract

In the era of deep learning (DL), convolutional neural networks (CNNs), and large language models (LLMs), machine learning (ML) models are becoming increasingly complex, demanding significant computational resources for both inference and training stages. To address this challenge, distributed learning has emerged as a crucial approach, employing parallelization across various devices and environments. This survey explores the landscape of distributed learning, encompassing cloud and edge settings. We delve into the core concepts of data and model parallelism, examining how models are partitioned across different dimensions and layers to optimize resource utilization and performance. We analyze various partitioning schemes for different layer types, including fully connected, convolutional, and recurrent layers, highlighting the trade-offs between computational efficiency, communication overhead, and memory constraints. This survey provides valuable insights for future research and development in this rapidly evolving field by comparing and contrasting distributed learning approaches across diverse contexts.

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

TL;DR

The increasing size and complexity of deep learning models necessitate distributed learning across cloud, mobile, and edge environments. The survey maps data and model parallelism across dimensions

, and

, and reviews layer-level partitioning techniques for FC, CNN, and RNN/LSTM architectures, including MoDNN, layer fusion, channel/weight partitioning, and gate-/weight-based partitioning, as well as asynchronous and privacy-preserving paradigms. It categorizes partitioning schemes by training vs inference and cloud vs edge contexts, summarizes trade-offs among computation, communication, and memory, and identifies key challenges and future directions. The work provides a structured reference to guide the design of scalable, privacy-aware distributed DL systems in heterogeneous environments.

Abstract

Paper Structure (13 sections, 6 equations, 5 figures, 4 tables)

This paper contains 13 sections, 6 equations, 5 figures, 4 tables.

Introduction
Data and Model Partitioning
Layer Partitioning
Fully Connected Layers
Convolutional Layers
Feature Map Partitioning
Channel, Filter, and Weight Partitioning
Recurrent Layers
Gate-Based Partitioning
Weight-Based Partitioning
Model Partitioning
Challenges and Future Directions
Summary and Conclusions

Figures (5)

Figure 1: Data and model parallelism in a neural network.
Figure 2: Fully connected layer parallelism.
Figure 3: Convolutional layer parallelism.
Figure 4: Shared data in distributed convolutional operations.
Figure 5: Long Short-Term Memory (LSTM) cell.

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

TL;DR

Abstract

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

Authors

TL;DR

Abstract

Table of Contents

Figures (5)