Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

Luigi Capogrosso; Enrico Fraccaroli; Giulio Petrozziello; Francesco Setti; Samarjit Chakraborty; Franco Fummi; Marco Cristani

Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

Luigi Capogrosso, Enrico Fraccaroli, Giulio Petrozziello, Francesco Setti, Samarjit Chakraborty, Franco Fummi, Marco Cristani

TL;DR

The paper tackles the challenge of running large DNNs on edge devices by integrating predefined sparsity with Split Computing (SC) and Early Exit (EE). It formalizes structured predefined sparsity with per-junction out-degrees $d^{out}_{i}$ and in-degrees $d^{in}_{i}$, yielding edge counts $|W_i|=N_{i-1}d^{out}_{i}=N_i d^{in}_{i}$ and densities $\rho_i=|W_i|/(N_{i-1}N_i)$, with densities constrained by $\rho_i=k/\gcd(N_{i-1},N_i)$. The approach applies the sparsity pattern before training and keeps it fixed throughout, enabling hardware-agnostic reductions in compute, storage, and energy. Experimental results on MNIST-like tasks show sparse head/tail configurations achieve high accuracy with substantially fewer parameters and benefit from reduced communication via EE, achieving over $4\times$ reductions in storage and computation while maintaining performance. This work paves the way for efficient, edge-amenable deployments of complex models across SC and EE frameworks.

Abstract

In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices available today to solve real-time and real-world tasks. This paper introduces a novel approach to address this challenge by combining the concept of predefined sparsity with Split Computing (SC) and Early Exit (EE). In particular, SC aims at splitting a DNN with a part of it deployed on an edge device and the rest on a remote server. Instead, EE allows the system to stop using the remote server and rely solely on the edge device's computation if the answer is already good enough. Specifically, how to apply such a predefined sparsity to a SC and EE paradigm has never been studied. This paper studies this problem and shows how predefined sparsity significantly reduces the computational, storage, and energy burdens during the training and inference phases, regardless of the hardware platform. This makes it a valuable approach for enhancing the performance of SC and EE applications. Experimental results showcase reductions exceeding 4x in storage and computational complexity without compromising performance. The source code is available at https://github.com/intelligolabs/sparsity_sc_ee.

Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

TL;DR

and in-degrees

, yielding edge counts

and densities

, with densities constrained by

. The approach applies the sparsity pattern before training and keeps it fixed throughout, enabling hardware-agnostic reductions in compute, storage, and energy. Experimental results on MNIST-like tasks show sparse head/tail configurations achieve high accuracy with substantially fewer parameters and benefit from reduced communication via EE, achieving over

reductions in storage and computation while maintaining performance. This work paves the way for efficient, edge-amenable deployments of complex models across SC and EE frameworks.

Abstract

Paper Structure (10 sections, 11 equations, 5 figures, 2 tables)

This paper contains 10 sections, 11 equations, 5 figures, 2 tables.

Introduction
Related Work
Deep Neural Networks (DNNs) sparsity
Distributed deep learning
Method
Experiments
Why predefined sparsity in sc and ee?
Results
Discussion
Conclusion

Figures (5)

Figure 1: Difference between our approach of predefined sparsity applied to and , against the state-of-the-art pruning and quantization.
Figure 2: Starting from a $\mathcal{M}(\cdot{})$, we first apply the predefined sparsity, and then we train the network. After the training stage, we split the network following the and paradigm. As a result, the final architecture is not so computationally intensive, doesn't require huge storage spaces, and has less energy consumption, all without compromising the overall performance.
Figure 3: Histograms of weights in each junction resulting from training a deep on the MNIST dataset. The network configuration $H=[H_{0},\dots{},H_{n}]$ used is [800, 180, 180, 10].
Figure 4: Accuracy tests by the number of parameters of deep, shallow, and sparse head .
Figure 5: Accuracy tests by number of parameters of deep, shallow, and sparse tail .

Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

TL;DR

Abstract

Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

Authors

TL;DR

Abstract

Table of Contents

Figures (5)