Combining Cloud and Mobile Computing for Machine Learning

Ruiqi Xu; Tianchi Zhang

Combining Cloud and Mobile Computing for Machine Learning

Ruiqi Xu, Tianchi Zhang

TL;DR

This work investigates edge-cloud collaborative inference through model segmentation to reduce cloud workload while meeting latency SLAs. It introduces a latency-aware scheduler that considers device capability, network quality, and job requirements, and validates the approach on RegNet and Stable Diffusion. Results show stable diffusion can benefit from offloading with intelligent batching and preloading, while RegNet may not due to data transfer costs; the scheduler achieves up to substantial reductions in cloud GPU usage. The study outlines a practical path toward fog-like computing, highlights memory and security considerations, and proposes future refinements for memory management and adaptive SLA policies.

Abstract

Although the computing power of mobile devices is increasing, machine learning models are also growing in size. This trend creates problems for mobile devices due to limitations like their memory capacity and battery life. While many services, like ChatGPT and Midjourney, run all the inferences in the cloud, we believe a flexible and fine-grained task distribution is more desirable. In this work, we consider model segmentation as a solution to improving the user experience, dividing the computation between mobile devices and the cloud in a way that offloads the compute-heavy portion of the model while minimizing the data transfer required. We show that the division not only reduces the wait time for users but can also be fine-tuned to optimize the workloads of the cloud. To achieve that, we design a scheduler that collects information about network quality, client device capability, and job requirements, making decisions to achieve consistent performance across a range of devices while reducing the work the cloud needs to perform.

Combining Cloud and Mobile Computing for Machine Learning

TL;DR

Abstract

Paper Structure (34 sections, 7 equations, 20 figures, 4 tables)

This paper contains 34 sections, 7 equations, 20 figures, 4 tables.

Motivation
Introduction
Background
Large Machine Learning Models
Machine Learning Acceleration
Low-Latency Inference Services
Machine Learning on the Cloud
Implementation
Model Splitting
RegNet
Stable Diffusion
Data Transmission
Enforcing End-to-end Latency
Intelligent Batching
Other Features
...and 19 more sections

Figures (20)

Figure 1: Battery Life Reduction of Running Large Machine Learning on Edge
Figure 2: Model architecture of RegNet
Figure 3: Model architecture of Stable Diffusion
Figure 4: Transmission Cost vs. Tensor Size
Figure 5: Deserialization Cost vs. Tensor Size
...and 15 more figures

Combining Cloud and Mobile Computing for Machine Learning

TL;DR

Abstract

Combining Cloud and Mobile Computing for Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (20)