Table of Contents
Fetching ...

Distributed Inference on Mobile Edge and Cloud: A Data-Cartography based Clustering Approach

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

TL;DR

This work tackles the challenge of deploying large DNNs on resource-limited devices by proposing DIMEC-DC, a data-cartography–driven distributed inference framework that adaptively splits computation across mobile, edge, and cloud based on sample complexity. By clustering samples into easy, medium, and hard pools derived from training-dynamics, the approach assigns inference tasks to the most cost-efficient device while maintaining backbone accuracy. Exit classifiers are trained at mobile and edge exits, and a reward-based mechanism selects thresholds to maximize expected performance under processing and offloading costs. Experiments on GLUE NLP tasks demonstrate substantial cost reductions (over 43%) with minimal accuracy loss (below 0.5%), and the method proves robust to varying device and network cost structures, highlighting practical impact for real-world mobile-edge-cloud deployments.

Abstract

The large size of DNNs poses a significant challenge for deployment on devices with limited resources, such as mobile, edge, and IoT platforms. To address this issue, a distributed inference framework can be utilized. In this framework, a small-scale DNN (initial layers) is deployed on mobile devices, a larger version on edge devices, and the full DNN on the cloud. Samples with low complexity (easy) can be processed on mobile, those with moderate complexity (medium) on edge devices, and high complexity (hard) samples on the cloud. Given that the complexity of each sample is unknown in advance, the crucial question in distributed inference is determining the sample complexity for appropriate DNN processing. We introduce a novel method named \our{}, which leverages the Data Cartography approach initially proposed for enhancing DNN generalization. By employing data cartography, we assess sample complexity. \our{} aims to boost accuracy while considering the offloading costs from mobile to edge/cloud. Our experimental results on GLUE datasets, covering a variety of NLP tasks, indicate that our approach significantly lowers inference costs by more than 43\% while maintaining a minimal accuracy drop of less than 0.5\% compared to performing all inferences on the cloud. The source code is available at https://anonymous.4open.science/r/DIMEC-1B04.

Distributed Inference on Mobile Edge and Cloud: A Data-Cartography based Clustering Approach

TL;DR

This work tackles the challenge of deploying large DNNs on resource-limited devices by proposing DIMEC-DC, a data-cartography–driven distributed inference framework that adaptively splits computation across mobile, edge, and cloud based on sample complexity. By clustering samples into easy, medium, and hard pools derived from training-dynamics, the approach assigns inference tasks to the most cost-efficient device while maintaining backbone accuracy. Exit classifiers are trained at mobile and edge exits, and a reward-based mechanism selects thresholds to maximize expected performance under processing and offloading costs. Experiments on GLUE NLP tasks demonstrate substantial cost reductions (over 43%) with minimal accuracy loss (below 0.5%), and the method proves robust to varying device and network cost structures, highlighting practical impact for real-world mobile-edge-cloud deployments.

Abstract

The large size of DNNs poses a significant challenge for deployment on devices with limited resources, such as mobile, edge, and IoT platforms. To address this issue, a distributed inference framework can be utilized. In this framework, a small-scale DNN (initial layers) is deployed on mobile devices, a larger version on edge devices, and the full DNN on the cloud. Samples with low complexity (easy) can be processed on mobile, those with moderate complexity (medium) on edge devices, and high complexity (hard) samples on the cloud. Given that the complexity of each sample is unknown in advance, the crucial question in distributed inference is determining the sample complexity for appropriate DNN processing. We introduce a novel method named \our{}, which leverages the Data Cartography approach initially proposed for enhancing DNN generalization. By employing data cartography, we assess sample complexity. \our{} aims to boost accuracy while considering the offloading costs from mobile to edge/cloud. Our experimental results on GLUE datasets, covering a variety of NLP tasks, indicate that our approach significantly lowers inference costs by more than 43\% while maintaining a minimal accuracy drop of less than 0.5\% compared to performing all inferences on the cloud. The source code is available at https://anonymous.4open.science/r/DIMEC-1B04.

Paper Structure

This paper contains 23 sections, 6 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: The figure shows the clustering of samples based on the confidence and variance obtained during training across epochs, where green samples are easy samples, blue samples are moderate, and red are hard samples.
  • Figure 2: In this figure, three types of reviews are input to the mobile device. It passes through the embedding layer on the mobile device where it decides about the complexity of the sample. The DNN is divided into three parts: 1) First $m$ layers are deployed on the mobile device and easy samples are then inferred on the mobile device. 2) First $n$ layers are deployed on the edge device and the sample that is more complex than it can not be inferred on the mobile is inferred at the edge. 3) Finally, a fully-fledged DNN is deployed on the cloud and the sample is offloaded only if it falls in the hardest pool of samples, i.e., both mobile and edge do not have sufficient layers of the DNN to correctly predict it.
  • Figure 3: The figure shows the clustering of samples based on the confidence and variance on the validation split of the datasets, the figures show the initial proportion of samples in different clusters.
  • Figure 4: The figure shows the accuracy of the individual devices i.e., mobile, edge and cloud. Figure on centre and right: The t-SNE visualization of the word embeddings of the easy, moderate and hard pool created for the QNLI and SST-2 datasets.
  • Figure 5: The changes in accuracy and percentage change in cost values when one of the costs is varied while keeping others at a constant value.