Distributed Inference on Mobile Edge and Cloud: A Data-Cartography based Clustering Approach
Divya Jyoti Bajpai, Manjesh Kumar Hanawal
TL;DR
This work tackles the challenge of deploying large DNNs on resource-limited devices by proposing DIMEC-DC, a data-cartography–driven distributed inference framework that adaptively splits computation across mobile, edge, and cloud based on sample complexity. By clustering samples into easy, medium, and hard pools derived from training-dynamics, the approach assigns inference tasks to the most cost-efficient device while maintaining backbone accuracy. Exit classifiers are trained at mobile and edge exits, and a reward-based mechanism selects thresholds to maximize expected performance under processing and offloading costs. Experiments on GLUE NLP tasks demonstrate substantial cost reductions (over 43%) with minimal accuracy loss (below 0.5%), and the method proves robust to varying device and network cost structures, highlighting practical impact for real-world mobile-edge-cloud deployments.
Abstract
The large size of DNNs poses a significant challenge for deployment on devices with limited resources, such as mobile, edge, and IoT platforms. To address this issue, a distributed inference framework can be utilized. In this framework, a small-scale DNN (initial layers) is deployed on mobile devices, a larger version on edge devices, and the full DNN on the cloud. Samples with low complexity (easy) can be processed on mobile, those with moderate complexity (medium) on edge devices, and high complexity (hard) samples on the cloud. Given that the complexity of each sample is unknown in advance, the crucial question in distributed inference is determining the sample complexity for appropriate DNN processing. We introduce a novel method named \our{}, which leverages the Data Cartography approach initially proposed for enhancing DNN generalization. By employing data cartography, we assess sample complexity. \our{} aims to boost accuracy while considering the offloading costs from mobile to edge/cloud. Our experimental results on GLUE datasets, covering a variety of NLP tasks, indicate that our approach significantly lowers inference costs by more than 43\% while maintaining a minimal accuracy drop of less than 0.5\% compared to performing all inferences on the cloud. The source code is available at https://anonymous.4open.science/r/DIMEC-1B04.
