DistrEE: Distributed Early Exit of Deep Neural Network Inference on Edge Devices
Xian Peng, Xin Wu, Lianming Xu, Li Wang, Aiguo Fei
TL;DR
DistrEE tackles the challenge of running accurate DNN inference on resource-constrained edge clusters with variable device availability and input difficulty. It proposes an end-to-end framework that merges distributed multi-branch networks with early exits, consisting of offline joint training and online dynamic inference. The training objective combines knowledge-distillation and feature-transfer components, while the online policy relies on a feature-difference criterion rather than softmax confidence. Evaluation on CIFAR-10 with WideResNet demonstrates that DistrEE maintains high accuracy close to a full-execution baseline while substantially reducing computation and latency, enabling scalable, privacy-preserving edge inference.
Abstract
Distributed DNN inference is becoming increasingly important as the demand for intelligent services at the network edge grows. By leveraging the power of distributed computing, edge devices can perform complicated and resource-hungry inference tasks previously only possible on powerful servers, enabling new applications in areas such as autonomous vehicles, industrial automation, and smart homes. However, it is challenging to achieve accurate and efficient distributed edge inference due to the fluctuating nature of the actual resources of the devices and the processing difficulty of the input data. In this work, we propose DistrEE, a distributed DNN inference framework that can exit model inference early to meet specific quality of service requirements. In particular, the framework firstly integrates model early exit and distributed inference for multi-node collaborative inferencing scenarios. Furthermore, it designs an early exit policy to control when the model inference terminates. Extensive simulation results demonstrate that DistrEE can efficiently realize efficient collaborative inference, achieving an effective trade-off between inference latency and accuracy.
