Table of Contents
Fetching ...

Accelerate Intermittent Deep Inference

Ziliang Zhang

TL;DR

The paper tackles the challenge of running DNN inference on battery-less, intermittently powered edge devices with extreme SRAM constraints. It introduces Accelerated Intermittent Deep Inference, combining realtime schedulability analysis with an accelerated intermittent NAS to produce higher-accuracy models that fit within small SRAM budgets and meet per-cycle energy and latency constraints. Key contributions include knowledge distillation-based model shrinking, hyper-tile batching for throughput, advanced NAS controllers, and on-device C runtime generation, validated on MSP430 with CIFAR-10 and Tiny ImageNet datasets, achieving up to 19–60% gains in practical settings while enabling substantially larger networks under 2 KB SRAM. The work demonstrates the feasibility and impact of end-to-end intermittent-powered inference, offering a path toward practical on-device AI on ultra-low-power hardware and guiding future hardware and software co-design for intermittent edge computing.

Abstract

Emerging research in edge devices and micro-controller units (MCU) enables on-device computation of Deep Learning Training and Inferencing tasks. More recently, contemporary trends focus on making the Deep Neural Net (DNN) Models runnable on battery-less intermittent devices. One of the approaches is to shrink the DNN models by enabling weight sharing, pruning, and conducted Neural Architecture Search (NAS) with optimized search space to target specific edge devices \cite{Cai2019OnceFA} \cite{Lin2020MCUNetTD} \cite{Lin2021MCUNetV2MP} \cite{Lin2022OnDeviceTU}. Another approach analyzes the intermittent execution and designs the corresponding system by performing NAS that is aware of intermittent execution cycles and resource constraints \cite{iNAS} \cite{HW-NAS} \cite{iLearn}. However, the optimized NAS was only considering consecutive execution with no power loss, and intermittent execution designs only focused on balancing data reuse and costs related to intermittent inference and often with low accuracy. We proposed Accelerated Intermittent Deep Inference to harness the power of optimized inferencing DNN models specifically targeting SRAM under 256KB and make it schedulable and runnable within intermittent power. Our main contribution is: (1) Schedule tasks performed by on-device inferencing into intermittent execution cycles and optimize for latency; (2) Develop a system that can satisfy the end-to-end latency while achieving a much higher accuracy compared to baseline \cite{iNAS} \cite{HW-NAS}

Accelerate Intermittent Deep Inference

TL;DR

The paper tackles the challenge of running DNN inference on battery-less, intermittently powered edge devices with extreme SRAM constraints. It introduces Accelerated Intermittent Deep Inference, combining realtime schedulability analysis with an accelerated intermittent NAS to produce higher-accuracy models that fit within small SRAM budgets and meet per-cycle energy and latency constraints. Key contributions include knowledge distillation-based model shrinking, hyper-tile batching for throughput, advanced NAS controllers, and on-device C runtime generation, validated on MSP430 with CIFAR-10 and Tiny ImageNet datasets, achieving up to 19–60% gains in practical settings while enabling substantially larger networks under 2 KB SRAM. The work demonstrates the feasibility and impact of end-to-end intermittent-powered inference, offering a path toward practical on-device AI on ultra-low-power hardware and guiding future hardware and software co-design for intermittent edge computing.

Abstract

Emerging research in edge devices and micro-controller units (MCU) enables on-device computation of Deep Learning Training and Inferencing tasks. More recently, contemporary trends focus on making the Deep Neural Net (DNN) Models runnable on battery-less intermittent devices. One of the approaches is to shrink the DNN models by enabling weight sharing, pruning, and conducted Neural Architecture Search (NAS) with optimized search space to target specific edge devices \cite{Cai2019OnceFA} \cite{Lin2020MCUNetTD} \cite{Lin2021MCUNetV2MP} \cite{Lin2022OnDeviceTU}. Another approach analyzes the intermittent execution and designs the corresponding system by performing NAS that is aware of intermittent execution cycles and resource constraints \cite{iNAS} \cite{HW-NAS} \cite{iLearn}. However, the optimized NAS was only considering consecutive execution with no power loss, and intermittent execution designs only focused on balancing data reuse and costs related to intermittent inference and often with low accuracy. We proposed Accelerated Intermittent Deep Inference to harness the power of optimized inferencing DNN models specifically targeting SRAM under 256KB and make it schedulable and runnable within intermittent power. Our main contribution is: (1) Schedule tasks performed by on-device inferencing into intermittent execution cycles and optimize for latency; (2) Develop a system that can satisfy the end-to-end latency while achieving a much higher accuracy compared to baseline \cite{iNAS} \cite{HW-NAS}
Paper Structure (21 sections, 14 figures)

This paper contains 21 sections, 14 figures.

Figures (14)

  • Figure 1: Intermittent DNN Execution Pattern
  • Figure 2: State-of-art baseline Appraoch - OFA and iNAS
  • Figure 3: Scheduling Optimization
  • Figure 4: More efficient Tiled DNN: Increase granularity due to a larger budget
  • Figure 5: Weight Sharing Model after Knowledge Distillation
  • ...and 9 more figures