Table of Contents
Fetching ...

ECCENTRIC: Edge-Cloud Collaboration Framework for Distributed Inference Using Knowledge Adaptation

Mohammad Mahdi Kamani, Zhongwei Cheng, Lin Chen

TL;DR

This paper introduces Eccentric, an edge-cloud collaboration framework that learns Pareto-optimal models for distributed inference by transferring knowledge from edge to cloud. It presents three architectures—Independent ECC, Adaptive ECC, and Dynamic ECC—along with training strategies including knowledge distillation, knowledge adaptation, and recall-rate boosting, all aimed at reducing cloud offload while preserving performance. New evaluation criteria are defined to quantify communication, computation, and performance trade-offs, and the approach is validated on CIFAR-10 classification and COCO/YOLOv5-based object detection, demonstrating near cloud-level performance with substantial resource savings. The work offers a flexible, compression-like mechanism for edge-cloud inference and points to potential extensions to broader tasks and generative adaptation methods.

Abstract

The massive growth in the utilization of edge AI has made the applications of machine learning models ubiquitous in different domains. Despite the computation and communication efficiency of these systems, due to limited computation resources on edge devices, relying on more computationally rich systems on the cloud side is inevitable in most cases. Cloud inference systems can achieve the best performance while the computation and communication cost is dramatically increasing by the expansion of a number of edge devices relying on these systems. Hence, there is a trade-off between the computation, communication, and performance of these systems. In this paper, we propose a novel framework, dubbed as Eccentric that learns models with different levels of trade-offs between these conflicting objectives. This framework, based on an adaptation of knowledge from the edge model to the cloud one, reduces the computation and communication costs of the system during inference while achieving the best performance possible. The Eccentric framework can be considered as a new form of compression method suited for edge-cloud inference systems to reduce both computation and communication costs. Empirical studies on classification and object detection tasks corroborate the efficacy of this framework.

ECCENTRIC: Edge-Cloud Collaboration Framework for Distributed Inference Using Knowledge Adaptation

TL;DR

This paper introduces Eccentric, an edge-cloud collaboration framework that learns Pareto-optimal models for distributed inference by transferring knowledge from edge to cloud. It presents three architectures—Independent ECC, Adaptive ECC, and Dynamic ECC—along with training strategies including knowledge distillation, knowledge adaptation, and recall-rate boosting, all aimed at reducing cloud offload while preserving performance. New evaluation criteria are defined to quantify communication, computation, and performance trade-offs, and the approach is validated on CIFAR-10 classification and COCO/YOLOv5-based object detection, demonstrating near cloud-level performance with substantial resource savings. The work offers a flexible, compression-like mechanism for edge-cloud inference and points to potential extensions to broader tasks and generative adaptation methods.

Abstract

The massive growth in the utilization of edge AI has made the applications of machine learning models ubiquitous in different domains. Despite the computation and communication efficiency of these systems, due to limited computation resources on edge devices, relying on more computationally rich systems on the cloud side is inevitable in most cases. Cloud inference systems can achieve the best performance while the computation and communication cost is dramatically increasing by the expansion of a number of edge devices relying on these systems. Hence, there is a trade-off between the computation, communication, and performance of these systems. In this paper, we propose a novel framework, dubbed as Eccentric that learns models with different levels of trade-offs between these conflicting objectives. This framework, based on an adaptation of knowledge from the edge model to the cloud one, reduces the computation and communication costs of the system during inference while achieving the best performance possible. The Eccentric framework can be considered as a new form of compression method suited for edge-cloud inference systems to reduce both computation and communication costs. Empirical studies on classification and object detection tasks corroborate the efficacy of this framework.

Paper Structure

This paper contains 34 sections, 2 theorems, 17 equations, 7 figures, 5 tables.

Key Result

Proposition 1

Consider a multiple objective problem with $p$ objectives of $\bm{\mathrm{h}}\left(\bm{w}\right) = \left[\mathrm{h}_1\left(\bm{w}\right), \mathrm{h}_2\left(\bm{w}\right), \ldots, \mathrm{h}_p\left(\bm{w}\right)\right]$, that ought to be minimized. Using the solution of the following quadratic optimi where $\bm{\mathrm{g}}_i\left(\bm{w}\right) = \nabla_{\bm{w}}\mathrm{h}_i\left(w\right), i\in [p]$

Figures (7)

  • Figure 1: Eccentric framework seeks to fill the gap between the edge and cloud inference systems by learning Pareto optimal models with different levels of trade-off between computation, communication, and performance of the inference system from their Pareto frontier surface. Arrows show the direction of increase.
  • Figure 2: The proposed Eccentric frameworks for distributed inference using edge-cloud collaboration with knowledge adaptation.
  • Figure 3: Trade-off between computation and performance of different ECC models. The $\texttt{ECC}_\text{A}$ models are adapting the first residual layer of the edge model to either of the last three layers of the cloud model (2,3,4), by different adaptation modules with different number of layers $\{\mathrm{r}_1,\mathrm{r}_2,\mathrm{r}_3\}$
  • Figure 4: Trade-off between computation and performance of different ECC models for object detection. The $\texttt{ECC}_\text{D}$ models achieve the same level of performance as the cloud with greater than $40\%$ reduction in computation and communication.
  • Figure 5: The Pareto frontier curve of Performance vs. Computation and Performance vs. Communication extracted from the ECC models learned for the defined object detection task.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 1: Pareto Descent Direction
  • Proposition 2: Pareto Descent Direction
  • proof