AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
Yuzhan Wang, Sicong Liu, Bin Guo, Boqi Zhang, Ke Ma, Yasan Ding, Hao Luo, Yao Li, Zhiwen Yu
TL;DR
AdaScale tackles the challenge of deploying DNNs on mobile devices under dynamic resource contexts by proposing an elastic inference framework that automates on-device adaptation. It combines a server-side, compression-operator ensembled, multi-branch self-evolutionary network with an on-device resource-awareness module and a runtime performance profiler to guide automatic adaptation in real time. The approach yields substantial gains, including an accuracy improvement of $+5.09\%$, $66.89\%$ lower training overhead, $1.51$–$6.2\times$ faster inference, and $4.69\times$ lower energy, while keeping accuracy loss under $4\%$ in constrained contexts. This enables accurate, low-latency DNN inference on heterogeneous mobile hardware without heavy cloud dependence, advancing practical edge AI with strong context-awareness and energy efficiency.
Abstract
Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.
