Table of Contents
Fetching ...

Tiny Machine Learning: Progress and Futures

Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Song Han

TL;DR

This paper addresses the challenge of running deep learning on ultra-low-memory microcontrollers by advocating a system-algorithm co-design approach for TinyML. It introduces MCUNet, a joint framework combining TinyNAS for automated tiny-model design and TinyEngine for memory-efficient inference, further extending towards on-device training via Quantization-Aware Scaling, sparse updates, and the Tiny Training Engine. Key contributions include automated search-space optimization, Once-For-All NAS specialization, code-generation-based inference, patch-based scheduling, and a complete training stack that enables on-device learning within tiny SRAM budgets, delivering state-of-the-art ImageNet results on MCUs and practical on-device adaptation capabilities. The work demonstrates substantial gains in memory efficiency and latency, enabling high-accuracy vision tasks and continuous on-device learning, with broad implications for privacy-preserving, low-power AI at the edge.

Abstract

Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, TinyML is challenging due to hardware constraints: the tiny memory resource makes it difficult to hold deep learning models designed for cloud and mobile platforms. There is also limited compiler and inference engine support for bare-metal devices. Therefore, we need to co-design the algorithm and system stack to enable TinyML. In this review, we will first discuss the definition, challenges, and applications of TinyML. We then survey the recent progress in TinyML and deep learning on MCUs. Next, we will introduce MCUNet, showing how we can achieve ImageNet-scale AI applications on IoT devices with system-algorithm co-design. We will further extend the solution from inference to training and introduce tiny on-device training techniques. Finally, we present future directions in this area. Today's large model might be tomorrow's tiny model. The scope of TinyML should evolve and adapt over time.

Tiny Machine Learning: Progress and Futures

TL;DR

This paper addresses the challenge of running deep learning on ultra-low-memory microcontrollers by advocating a system-algorithm co-design approach for TinyML. It introduces MCUNet, a joint framework combining TinyNAS for automated tiny-model design and TinyEngine for memory-efficient inference, further extending towards on-device training via Quantization-Aware Scaling, sparse updates, and the Tiny Training Engine. Key contributions include automated search-space optimization, Once-For-All NAS specialization, code-generation-based inference, patch-based scheduling, and a complete training stack that enables on-device learning within tiny SRAM budgets, delivering state-of-the-art ImageNet results on MCUs and practical on-device adaptation capabilities. The work demonstrates substantial gains in memory efficiency and latency, enabling high-accuracy vision tasks and continuous on-device learning, with broad implications for privacy-preserving, low-power AI at the edge.

Abstract

Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, TinyML is challenging due to hardware constraints: the tiny memory resource makes it difficult to hold deep learning models designed for cloud and mobile platforms. There is also limited compiler and inference engine support for bare-metal devices. Therefore, we need to co-design the algorithm and system stack to enable TinyML. In this review, we will first discuss the definition, challenges, and applications of TinyML. We then survey the recent progress in TinyML and deep learning on MCUs. Next, we will introduce MCUNet, showing how we can achieve ImageNet-scale AI applications on IoT devices with system-algorithm co-design. We will further extend the solution from inference to training and introduce tiny on-device training techniques. Finally, we present future directions in this area. Today's large model might be tomorrow's tiny model. The scope of TinyML should evolve and adapt over time.
Paper Structure (32 sections, 5 equations, 21 figures, 7 tables)

This paper contains 32 sections, 5 equations, 21 figures, 7 tables.

Figures (21)

  • Figure 1: Efficiency is critical for CloudML, EdgeML, and TinyML. CloudML targets high-throughput accelerators like GPUs, while EdgeML focuses on portable devices like mobile phones. TinyML further pushes the efficiency boundary, enabling powerful ML models to run on ultra-low-power devices such as microcontrollers.
  • Figure 2: We can't directly scale mobile ML or cloud ML models for TinyML. MobilenetV2 sandler2018mobilenetv2 with a width of 1.4 was used for the experiments. The batch size was set to 1 for inference and 8 for training. While MobilenetV2 reduces the number of parameters by 4.2$\times$ compared to ResNet, the peak memory usage increases by 2.3$\times$ for inference and only improves by 1.1$\times$ for training. Additionally, the total required training memory is 6.9$\times$ larger than the memory needed for inference. These results demonstrate the significant memory bottleneck for TinyML, and the bottleneck is the activation memory, not the number of parameters.
  • Figure 3: Techniques specifically designed for tiny devices. In order to fully leverage the limited available resources, we need to take careful consideration of both the system and the algorithm. The co-design approach not only enables practical AI applications on a wide range of IoT platforms (inference), but also allows AI to continuously learn over time, adapting to a world that is changing fast (training).
  • Figure 4: MCUNet jointly designs the neural architecture and the inference scheduling to fit the tight memory resource on microcontrollers. TinyEngine makes full use of the limited resources on MCU, allowing a larger design space for architecture search. With a larger degree of design freedom, TinyNAS is more likely to find a high accuracy model compared to using existing frameworks.
  • Figure 5: (a) TinyNAS is a two-stage neural architecture search method. It first specifies a sub-space according to the constraints, and then performs model specialization. (b) TinyNAS selects the best search space by analyzing the FLOPs CDF of different search spaces. Each curve represents a design space. Our insight is that the design space that is more likely to produce high FLOPs models under the memory constraint gives higher model capacity, thus more likely to achieve high accuracy.
  • ...and 16 more figures