Early-Exit meets Model-Distributed Inference at Edge Networks

Marco Colocrese; Erdem Koyuncu; Hulya Seferoglu

Early-Exit meets Model-Distributed Inference at Edge Networks

Marco Colocrese, Erdem Koyuncu, Hulya Seferoglu

TL;DR

This work tackles the high communication cost of data-distributed inference by proposing MDI-Exit, a decentralized framework that combines model-distributed inference with multiple exit points and adaptive data-admission controls. It leverages queue-length information and confidence-based early exits (with thresholds $T_e^k$ and confidences $C_k(d)$) to decide when to exit, where to offload, and how to admit data, all under a Network Utility Maximization–inspired scheme. The contributions include the design of per-task policies for inference, early-exit, and offloading, plus two data-admission strategies (arrival-rate adaptation and exit-threshold adaptation) guided by queue states; the approach is validated on real edge hardware with DNNs like MobileNetV2 and ResNet-50, showing higher data throughput at fixed accuracy and higher accuracy at fixed data rate, aided by feature-vector compression via an autoencoder. Overall, MDI-Exit enables scalable, latency-aware edge inference across heterogeneous devices, with practical impact for low-latency applications and bandwidth-constrained networks.

Abstract

Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire deep neural network (DNN) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of DNN layers. In MDI, a source device that has data processes a few layers of DNN and sends the output to a neighboring device, i.e., offloads the rest of the layers. This process ends when all layers are processed in a distributed manner. In this paper, we investigate the design and development of MDI with early-exit, which advocates that there is no need to process all the layers of a model for some data to reach the desired accuracy, i.e., we can exit the model without processing all the layers if target accuracy is reached. We design a framework MDI-Exit that adaptively determines early-exit and offloading policies as well as data admission at the source. Experimental results on a real-life testbed of NVIDIA Nano edge devices show that MDI-Exit processes more data when accuracy is fixed and results in higher accuracy for the fixed data rate.

Early-Exit meets Model-Distributed Inference at Edge Networks

TL;DR

and confidences

) to decide when to exit, where to offload, and how to admit data, all under a Network Utility Maximization–inspired scheme. The contributions include the design of per-task policies for inference, early-exit, and offloading, plus two data-admission strategies (arrival-rate adaptation and exit-threshold adaptation) guided by queue states; the approach is validated on real edge hardware with DNNs like MobileNetV2 and ResNet-50, showing higher data throughput at fixed accuracy and higher accuracy at fixed data rate, aided by feature-vector compression via an autoencoder. Overall, MDI-Exit enables scalable, latency-aware edge inference across heterogeneous devices, with practical impact for low-latency applications and bandwidth-constrained networks.

Abstract

Paper Structure (8 sections, 2 equations, 6 figures, 4 algorithms)

This paper contains 8 sections, 2 equations, 6 figures, 4 algorithms.

Introduction
Related Work
System Model and Background
Model-Distributed Inference with Early-Exit
Inference, Early-Exit, and Offloading
Data Admission Policies
Experimental Results
Conclusion

Figures (6)

Figure 1: Model-distributed inference with early exit.
Figure 2: MobileNetV2 and ResNet50 architectures with early-exit points.
Figure 3: MobileNetV2. Early-exit confidence threshold (accuracy) is fixed.
Figure 4: ResNet50. Early-exit confidence threshold (accuracy) is fixed.
Figure 5: MobileNetV2. Poisson arrival with a fixed average arrival rate.
...and 1 more figures

Theorems & Definitions (1)

Example 1

Early-Exit meets Model-Distributed Inference at Edge Networks

TL;DR

Abstract

Early-Exit meets Model-Distributed Inference at Edge Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)