Table of Contents
Fetching ...

AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices

Mengyang Liu, Chenyu Lu, Haodong Tian, Fang Dong, Ruiting Zhou, Wei Wang, Dian Shen, Guangtong Li, Ye Wan, Li Li

TL;DR

AutoTailor automates SuperNet-based adaptive deployment on edge devices by introducing TailorIR, a computation-graph abstraction, and learning-free LUT-based latency and modification-sensitivity accuracy predictors. This approach automatically converts static DNNs into TailorIR-based SuperNets, fine-tunes them, and efficiently explores architectures under device-specific constraints. The framework significantly reduces engineering effort and profiling costs while delivering substantial latency reductions and accuracy gains across diverse models and hardware. It demonstrates practical applicability for real-world edge scenarios and suggests potential extensions to other tasks beyond image classification.

Abstract

On-device machine learning (ML) has become a fundamental component of emerging mobile applications. Adaptive model deployment delivers efficient inference for heterogeneous device capabilities and performance requirements through customizing neural architectures. SuperNet-based approaches offer a promising solution by generating a large number of model variants from a pre-trained ML model. However, applying SuperNet in existing frameworks suffers from tedious model-aware development and time-consuming hardware-aware profiling, which limits their practical adoption. We present AutoTailor, the first framework to enable automated, end-to-end SuperNet-based adaptive model deployment for edge devices. Unlike manual SuperNet construction, AutoTailor employs a computation graph-guided compilation approach to automatically transform user-provided ML models into SuperNets. To support efficient specialization, AutoTailor incorporates learning-free latency and accuracy predictors, enabling low-cost yet accurate performance prediction. Our extended evaluations demonstrate that AutoTailor reduces the lines of code for SuperNet construction by 11--27$\times$, decreases hardware-aware profiling costs by at least 11$\times$, and achieves up to 15.60\% absolute accuracy improvement and 60.03\% latency reduction compared to state-of-the-art approaches across diverse models and devices.

AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices

TL;DR

AutoTailor automates SuperNet-based adaptive deployment on edge devices by introducing TailorIR, a computation-graph abstraction, and learning-free LUT-based latency and modification-sensitivity accuracy predictors. This approach automatically converts static DNNs into TailorIR-based SuperNets, fine-tunes them, and efficiently explores architectures under device-specific constraints. The framework significantly reduces engineering effort and profiling costs while delivering substantial latency reductions and accuracy gains across diverse models and hardware. It demonstrates practical applicability for real-world edge scenarios and suggests potential extensions to other tasks beyond image classification.

Abstract

On-device machine learning (ML) has become a fundamental component of emerging mobile applications. Adaptive model deployment delivers efficient inference for heterogeneous device capabilities and performance requirements through customizing neural architectures. SuperNet-based approaches offer a promising solution by generating a large number of model variants from a pre-trained ML model. However, applying SuperNet in existing frameworks suffers from tedious model-aware development and time-consuming hardware-aware profiling, which limits their practical adoption. We present AutoTailor, the first framework to enable automated, end-to-end SuperNet-based adaptive model deployment for edge devices. Unlike manual SuperNet construction, AutoTailor employs a computation graph-guided compilation approach to automatically transform user-provided ML models into SuperNets. To support efficient specialization, AutoTailor incorporates learning-free latency and accuracy predictors, enabling low-cost yet accurate performance prediction. Our extended evaluations demonstrate that AutoTailor reduces the lines of code for SuperNet construction by 11--27, decreases hardware-aware profiling costs by at least 11, and achieves up to 15.60\% absolute accuracy improvement and 60.03\% latency reduction compared to state-of-the-art approaches across diverse models and devices.

Paper Structure

This paper contains 28 sections, 3 equations, 21 figures, 3 tables.

Figures (21)

  • Figure 1: Latency-accuracy tradeoff of adaptive ResNet-50 on Exynos 1380 SoC big cores and ImageNet1k dataset across different methods. SuperNet consistently achieves state-of-the-art performance. Static use ResNet-18 and ResNet-34 as smaller model variants.
  • Figure 2: An illustration of adaptive model deployment and different model variants generation methods.
  • Figure 3: Comparison between block scaling and SuperNet-based model variants generation.
  • Figure 4: An illustration of SuperNet workflow.
  • Figure 5: Illustration of graph structural consistency property and comparison of SuperNet development methods.
  • ...and 16 more figures