Table of Contents
Fetching ...

Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

TL;DR

A novel approach named DIMEE is developed that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs and aims to improve the accuracy, taking into account the offloading cost from mobile to edge/cloud.

Abstract

Recent advances in Deep Neural Networks (DNNs) have demonstrated outstanding performance across various domains. However, their large size is a challenge for deployment on resource-constrained devices such as mobile, edge, and IoT platforms. To overcome this, a distributed inference setup can be used where a small-sized DNN (initial few layers) can be deployed on mobile, a bigger version on the edge, and the full-fledged, on the cloud. A sample that has low complexity (easy) could be then inferred on mobile, that has moderate complexity (medium) on edge, and higher complexity (hard) on the cloud. As the complexity of each sample is not known beforehand, the following question arises in distributed inference: how to decide complexity so that it is processed by enough layers of DNNs. We develop a novel approach named DIMEE that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs. DIMEE aims to improve the accuracy, taking into account the offloading cost from mobile to edge/cloud. Experimental validation on GLUE datasets, encompassing various NLP tasks, shows that our method significantly reduces the inference cost (> 43%) while maintaining a minimal drop in accuracy (< 0.3%) compared to the case where all the inference is made in cloud.

Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach

TL;DR

A novel approach named DIMEE is developed that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs and aims to improve the accuracy, taking into account the offloading cost from mobile to edge/cloud.

Abstract

Recent advances in Deep Neural Networks (DNNs) have demonstrated outstanding performance across various domains. However, their large size is a challenge for deployment on resource-constrained devices such as mobile, edge, and IoT platforms. To overcome this, a distributed inference setup can be used where a small-sized DNN (initial few layers) can be deployed on mobile, a bigger version on the edge, and the full-fledged, on the cloud. A sample that has low complexity (easy) could be then inferred on mobile, that has moderate complexity (medium) on edge, and higher complexity (hard) on the cloud. As the complexity of each sample is not known beforehand, the following question arises in distributed inference: how to decide complexity so that it is processed by enough layers of DNNs. We develop a novel approach named DIMEE that utilizes Early Exit (EE) strategies developed to minimize inference latency in DNNs. DIMEE aims to improve the accuracy, taking into account the offloading cost from mobile to edge/cloud. Experimental validation on GLUE datasets, encompassing various NLP tasks, shows that our method significantly reduces the inference cost (> 43%) while maintaining a minimal drop in accuracy (< 0.3%) compared to the case where all the inference is made in cloud.
Paper Structure (19 sections, 3 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 3 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: In this figure, three types of reviews are input to the mobile device. It passes through the embedding layer on the mobile device where it decides about the complexity of the sample. The DNN is divided into three parts: 1) First $m$ layers are deployed on the mobile device and easy samples are then inferred on the mobile device. 2) First $n$ layers are deployed on the edge device and the sample that is more complex that it can not be inferred on the mobile is inferred at the edge. 3) Finally, fully-fledged DNN is deployed on the cloud and the sample is offloaded only if it falls in the hardest pool of samples i.e. both mobile and edge cannot gain sufficient confidence to infer the sample.
  • Figure 2: The figure shows the accuracy and cost of the individual devices i.e., mobile, edge and cloud. Figure on right: The t-SNE visualization of the word embeddings of the easy, moderate and hard pool created for the SST-2 dataset
  • Figure 3: The changes in accuracy and percentage change in cost values when one of the cost is varied while keeping others at a constant value.