Table of Contents
Fetching ...

CrowdHMTware: A Cross-level Co-adaptation Middleware for Context-aware Mobile DL Deployment

Sicong Liu, Bin Guo, Shiyan Luo, Yuzhan Wang, Hao Luo, Cheng Fang, Yuan Xu, Ke Ma, Yao Li, Zhiwen Yu

TL;DR

This work tackles the challenge of deploying deep learning on resource-constrained mobile devices under dynamic contexts. It introduces CrowdHMTware, a dynamic context-adaptive middleware that enables cross-level co-adaptation among elastic inference, scalable offloading, and a model-adaptive compilation engine, guided by an automated adaptation loop. The approach employs a retraining-free, multi-variant front-end, hierarchical model partitioning with cross-framework transformation, and a backend engine that optimizes computation graphs, memory, and data reuse. Empirical results across 15 devices, 4 mobile applications, and multiple models show substantial improvements in latency (up to 10.3×), memory (up to ~4×), and energy efficiency, with accuracy gains in dynamic settings, demonstrating practical viability for robust mobile DL deployment. Overall, CrowdHMTware reduces developer effort while delivering scalable, context-aware DL inference on heterogeneous mobile platforms.

Abstract

There are many deep learning (DL) powered mobile and wearable applications today continuously and unobtrusively sensing the ambient surroundings to enhance all aspects of human lives.To enable robust and private mobile sensing, DL models are often deployed locally on resource-constrained mobile devices using techniques such as model compression or offloading.However, existing methods, either front-end algorithm level (i.e. DL model compression/partitioning) or back-end scheduling level (i.e. operator/resource scheduling), cannot be locally online because they require offline retraining to ensure accuracy or rely on manually pre-defined strategies, struggle with dynamic adaptability.The primary challenge lies in feeding back runtime performance from the back-end level to the front-end level optimization decision. Moreover, the adaptive mobile DL model porting middleware with cross-level co-adaptation is less explored, particularly in mobile environments with diversity and dynamics. In response, we introduce CrowdHMTware, a dynamic context-adaptive DL model deployment middleware for heterogeneous mobile devices. It establishes an automated adaptation loop between cross-level functional components, i.e. elastic inference, scalable offloading, and model-adaptive engine, enhancing scalability and adaptability. Experiments with four typical tasks across 15 platforms and a real-world case study demonstrate that CrowdHMTware can effectively scale DL model, offloading, and engine actions across diverse platforms and tasks. It hides run-time system issues from developers, reducing the required developer expertise.

CrowdHMTware: A Cross-level Co-adaptation Middleware for Context-aware Mobile DL Deployment

TL;DR

This work tackles the challenge of deploying deep learning on resource-constrained mobile devices under dynamic contexts. It introduces CrowdHMTware, a dynamic context-adaptive middleware that enables cross-level co-adaptation among elastic inference, scalable offloading, and a model-adaptive compilation engine, guided by an automated adaptation loop. The approach employs a retraining-free, multi-variant front-end, hierarchical model partitioning with cross-framework transformation, and a backend engine that optimizes computation graphs, memory, and data reuse. Empirical results across 15 devices, 4 mobile applications, and multiple models show substantial improvements in latency (up to 10.3×), memory (up to ~4×), and energy efficiency, with accuracy gains in dynamic settings, demonstrating practical viability for robust mobile DL deployment. Overall, CrowdHMTware reduces developer effort while delivering scalable, context-aware DL inference on heterogeneous mobile platforms.

Abstract

There are many deep learning (DL) powered mobile and wearable applications today continuously and unobtrusively sensing the ambient surroundings to enhance all aspects of human lives.To enable robust and private mobile sensing, DL models are often deployed locally on resource-constrained mobile devices using techniques such as model compression or offloading.However, existing methods, either front-end algorithm level (i.e. DL model compression/partitioning) or back-end scheduling level (i.e. operator/resource scheduling), cannot be locally online because they require offline retraining to ensure accuracy or rely on manually pre-defined strategies, struggle with dynamic adaptability.The primary challenge lies in feeding back runtime performance from the back-end level to the front-end level optimization decision. Moreover, the adaptive mobile DL model porting middleware with cross-level co-adaptation is less explored, particularly in mobile environments with diversity and dynamics. In response, we introduce CrowdHMTware, a dynamic context-adaptive DL model deployment middleware for heterogeneous mobile devices. It establishes an automated adaptation loop between cross-level functional components, i.e. elastic inference, scalable offloading, and model-adaptive engine, enhancing scalability and adaptability. Experiments with four typical tasks across 15 platforms and a real-world case study demonstrate that CrowdHMTware can effectively scale DL model, offloading, and engine actions across diverse platforms and tasks. It hides run-time system issues from developers, reducing the required developer expertise.

Paper Structure

This paper contains 36 sections, 3 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Illustration of an example mobile application, i.e. an user interacting with a voice assistant model.
  • Figure 2: Illustration of the CrowdHMTware architecture.
  • Figure 3: Model pre-partitioning with hierarchical granularity and adaptive offloading through pre-partition combination.
  • Figure 4: Integrating operator optimization into cross-framework transformation process, e.g. from PyTorch to PP.
  • Figure 5: Illustration of the dynamic model-adaptive engine.
  • ...and 8 more figures