Table of Contents
Fetching ...

AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

Yuzhan Wang, Sicong Liu, Bin Guo, Boqi Zhang, Ke Ma, Yasan Ding, Hao Luo, Yao Li, Zhiwen Yu

TL;DR

AdaScale tackles the challenge of deploying DNNs on mobile devices under dynamic resource contexts by proposing an elastic inference framework that automates on-device adaptation. It combines a server-side, compression-operator ensembled, multi-branch self-evolutionary network with an on-device resource-awareness module and a runtime performance profiler to guide automatic adaptation in real time. The approach yields substantial gains, including an accuracy improvement of $+5.09\%$, $66.89\%$ lower training overhead, $1.51$–$6.2\times$ faster inference, and $4.69\times$ lower energy, while keeping accuracy loss under $4\%$ in constrained contexts. This enables accurate, low-latency DNN inference on heterogeneous mobile hardware without heavy cloud dependence, advancing practical edge AI with strong context-awareness and energy efficiency.

Abstract

Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.

AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

TL;DR

AdaScale tackles the challenge of deploying DNNs on mobile devices under dynamic resource contexts by proposing an elastic inference framework that automates on-device adaptation. It combines a server-side, compression-operator ensembled, multi-branch self-evolutionary network with an on-device resource-awareness module and a runtime performance profiler to guide automatic adaptation in real time. The approach yields substantial gains, including an accuracy improvement of , lower training overhead, faster inference, and lower energy, while keeping accuracy loss under in constrained contexts. This enables accurate, low-latency DNN inference on heterogeneous mobile hardware without heavy cloud dependence, advancing practical edge AI with strong context-awareness and energy efficiency.

Abstract

Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.

Paper Structure

This paper contains 25 sections, 5 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of different model adaptive deployment approaches. Left: pre-deployment on-server model generation. Middle: post-deployment on-device model adaptation. Right: real-time post-deployment on-device model adaptation (ours).
  • Figure 2: The framework of AdaScale includes two main components: the pretraining of a multi-branch self-evolutionary network with a compression operator ensemble on the server , and the awareness of resource availability with runtime elastic model adjustment on the IoT device.
  • Figure 3: Illustration of multi-branch self-evolutionary network.
  • Figure 4: Left: a workflow of calculating memory access and cache rate. Right: pseudo codes in Python.
  • Figure 5: AdaScale improves DNN efficiency by optimizing search space, training time, and search overhead.
  • ...and 4 more figures