Table of Contents
Fetching ...

Embedded Distributed Inference of Deep Neural Networks: A Systematic Review

Federico Nicolás Peccia, Oliver Bringmann

TL;DR

This survey analyzes the state-of-the-art in embedded distributed inference for DNNs, focusing on how to partition and allocate inference across heterogeneous devices. It introduces a taxonomy across runtime flexibility, partition granularity, metrics, cost models, and other categorizations, and synthesizes insights from over 100 papers. Key findings include a prevalence of static offline partitioning, rising interest in adaptive schemes driven by bandwidth and resource changes, and a dominance of horizontal partitioning with notable gaps in vertical partitioning and layer-fusion techniques. The work highlights challenges such as standardized evaluation metrics, device-cost modeling, privacy considerations, and reproducibility through open-source releases, offering directions for future research and practical deployment.

Abstract

Embedded distributed inference of Neural Networks has emerged as a promising approach for deploying machine-learning models on resource-constrained devices in an efficient and scalable manner. The inference task is distributed across a network of embedded devices, with each device contributing to the overall computation by performing a portion of the workload. In some cases, more powerful devices such as edge or cloud servers can be part of the system to be responsible of the most demanding layers of the network. As the demand for intelligent systems and the complexity of the deployed neural network models increases, this approach is becoming more relevant in a variety of applications such as robotics, autonomous vehicles, smart cities, Industry 4.0 and smart health. We present a systematic review of papers published during the last six years which describe techniques and methods to distribute Neural Networks across these kind of systems. We provide an overview of the current state-of-the-art by analysing more than 100 papers, present a new taxonomy to characterize them, and discuss trends and challenges in the field.

Embedded Distributed Inference of Deep Neural Networks: A Systematic Review

TL;DR

This survey analyzes the state-of-the-art in embedded distributed inference for DNNs, focusing on how to partition and allocate inference across heterogeneous devices. It introduces a taxonomy across runtime flexibility, partition granularity, metrics, cost models, and other categorizations, and synthesizes insights from over 100 papers. Key findings include a prevalence of static offline partitioning, rising interest in adaptive schemes driven by bandwidth and resource changes, and a dominance of horizontal partitioning with notable gaps in vertical partitioning and layer-fusion techniques. The work highlights challenges such as standardized evaluation metrics, device-cost modeling, privacy considerations, and reproducibility through open-source releases, offering directions for future research and practical deployment.

Abstract

Embedded distributed inference of Neural Networks has emerged as a promising approach for deploying machine-learning models on resource-constrained devices in an efficient and scalable manner. The inference task is distributed across a network of embedded devices, with each device contributing to the overall computation by performing a portion of the workload. In some cases, more powerful devices such as edge or cloud servers can be part of the system to be responsible of the most demanding layers of the network. As the demand for intelligent systems and the complexity of the deployed neural network models increases, this approach is becoming more relevant in a variety of applications such as robotics, autonomous vehicles, smart cities, Industry 4.0 and smart health. We present a systematic review of papers published during the last six years which describe techniques and methods to distribute Neural Networks across these kind of systems. We provide an overview of the current state-of-the-art by analysing more than 100 papers, present a new taxonomy to characterize them, and discuss trends and challenges in the field.
Paper Structure (23 sections, 11 figures, 12 tables)

This paper contains 23 sections, 11 figures, 12 tables.

Figures (11)

  • Figure 1: A schematic representation of the partition and allocation of a DNN across multiple hardware components, including embedded devices parallelizing the execution of the same layer, edge ones running more complex layers, and cloud servers executing the more computation demanding layers.
  • Figure 2: Surveyed embedded distributed inference papers arranged per year
  • Figure 3: Different options to distribute the inference of a DNN across multiple devices. Green arrows represent configuration parameters. Orange arrows the movement of data between devices. Blue arrows the allocation of layers to particular devices.
  • Figure 4: Categorization of embedded distributed DNN inference papers and its position in the review structure
  • Figure 5: Systematic review process
  • ...and 6 more figures