Table of Contents
Fetching ...

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

Yoshitomo Matsubara, Marco Levorato, Francesco Restuccia

TL;DR

This survey addresses the latency-energy trade-offs of running deep neural networks on mobile devices by examining Split Computing (SC) and Early Exiting (EE) as complementary strategies. It provides a structured taxonomy of local, edge, split, and early-exit models, and critically compares SC approaches with and without bottleneck injection, along with their training methodologies. The EE discussion covers both computer vision and NLP, detailing rationale, architectures, and training schemes such as joint vs. separate training and knowledge distillation. The authors highlight practical research challenges, including evaluation in realistic settings, deployment across diverse domains, and potential information-theoretic perspectives, aiming to guide future work toward robust, energy-efficient, real-time mobile-edge DL systems.

Abstract

Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on mobile devices can quickly deplete their battery. Although task offloading to cloud/edge servers may decrease the mobile device's computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to embed multiple "exits" earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the trade-off between accuracy and delay can be tuned according to the current conditions or application demands. In this paper, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the paper by providing a set of compelling research challenges.

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

TL;DR

This survey addresses the latency-energy trade-offs of running deep neural networks on mobile devices by examining Split Computing (SC) and Early Exiting (EE) as complementary strategies. It provides a structured taxonomy of local, edge, split, and early-exit models, and critically compares SC approaches with and without bottleneck injection, along with their training methodologies. The EE discussion covers both computer vision and NLP, detailing rationale, architectures, and training schemes such as joint vs. separate training and knowledge distillation. The authors highlight practical research challenges, including evaluation in realistic settings, deployment across diverse domains, and potential information-theoretic perspectives, aiming to guide future work toward robust, energy-efficient, real-time mobile-edge DL systems.

Abstract

Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on mobile devices can quickly deplete their battery. Although task offloading to cloud/edge servers may decrease the mobile device's computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to embed multiple "exits" earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the trade-off between accuracy and delay can be tuned according to the current conditions or application demands. In this paper, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the paper by providing a set of compelling research challenges.

Paper Structure

This paper contains 19 sections, 8 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Overview of (a) local, (b) edge, (c) split computing, and (d) early exiting: image classification as an example.
  • Figure 2: Two different approaches.
  • Figure 3: Cross entropy-based training for bottleneck-injected .
  • Figure 4: Knowledge distillation for bottleneck-injected (student), using a pretrained model as teacher.
  • Figure 5: Reconstruction-based training to compress intermediate output (here $\mathbf{z}_{2}$) in by (yellow).
  • ...and 3 more figures