MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers

Tobias King; Yexu Zhou; Tobias Röddiger; Michael Beigl

MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers

Tobias King, Yexu Zhou, Tobias Röddiger, Michael Beigl

TL;DR

MicroNAS introduces a hardware-aware differentiable NAS framework tailored for time-series classification on memory- and latency-constrained microcontrollers. It combines a two-type cell search space (Time-Reduce and Sensor-Fusion) with a latency-lookup-table based hardware estimation and a multi-objective DNAS loss to satisfy user-defined Lat_t and Mem_t constraints. The system retrains found architectures with quantization-aware training and deploys them as tf-lite micro models, achieving competitive accuracy with strict MCU budgets. This work enables private, real-time on-device inference for time-series data in wearables, sensors, and IoT devices, eliminating reliance on cloud offloading and reducing energy costs.

Abstract

Designing domain specific neural networks is a time-consuming, error-prone, and expensive task. Neural Architecture Search (NAS) exists to simplify domain-specific model development but there is a gap in the literature for time series classification on microcontrollers. Therefore, we adapt the concept of differentiable neural architecture search (DNAS) to solve the time-series classification problem on resource-constrained microcontrollers (MCUs). We introduce MicroNAS, a domain-specific HW-NAS system integration of DNAS, Latency Lookup Tables, dynamic convolutions and a novel search space specifically designed for time-series classification on MCUs. The resulting system is hardware-aware and can generate neural network architectures that satisfy user-defined limits on the execution latency and peak memory consumption. Our extensive studies on different MCUs and standard benchmark datasets demonstrate that MicroNAS finds MCU-tailored architectures that achieve performance (F1-score) near to state-of-the-art desktop models. We also show that our approach is superior in adhering to memory and latency constraints compared to domain-independent NAS baselines such as DARTS.

MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 8 equations, 10 figures, 2 tables, 1 algorithm.

Introduction
Background and Related Work
Time Series Classification
Neural Architecture Search
System Overview
Latency & Peak Memory Estimation
Latency Characterization
Search Space
Decision Groups
Dynamic Convolutions
Cells
Time-Reduce Cell
Sensor-Fusion Cell
Output cell
Search Algorithm
...and 11 more sections

Figures (10)

Figure 1: MicroNAS requires the dataset to be split into three different sets which are used at different stages in the pipeline. The user specifies the dataset to be used, the target MCU ($MCU_t$) and the maximum allowed hardware utilization in terms of execution latency ($Lat_t$) and peak memory consumption ($Mem_t$). Output of the system is a corresponding neural network in the tf-lite format.
Figure 2: Execution latency of whole architectures from our search space. Left: Our lookup-table latency approach. MAE: 1.59ms, $R^2$: 99.97%. Right: Flops based estimate: MAE: 15.57ms, $R^2$: 96.78%.
Figure 3: High-level overview over the search space. The raw, windowed time series $x$ with shape $(ts_l, ts_s)$ is propagated though $N_{TR}$ Time-Reduce and $N_{SF}$ many Sensor-fusion cells. The resulting time series is then of shape $(\hat{ts}_l, \hat{ts}_s)$. Class probabilities $y$ and hardware metrics are output by the Output cell at the end of the network.
Figure 4: Dynamic convolution with three different options (e.g. $f_{max} = 24$, $g = 3$) for the number of filters. The binary masks ($m_i$) zero out certain filters in the output of the convolution. Grey areas are ones and white areas are zeros.h
Figure 5: Time-Reduce cell. Contains two decision groups. $\alpha_1$ to choose a convolution and $\alpha_{ytr}$ to search for the number of filters. $F$ is the filter size while $S$ is the stride configuration.
...and 5 more figures

MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers

TL;DR

Abstract

MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers

Authors

TL;DR

Abstract

Table of Contents

Figures (10)