microYOLO: Towards Single-Shot Object Detection on Microcontrollers
Mark Deutel, Christopher Mutschler, Jürgen Teich
TL;DR
This work investigates the feasibility of single-shot object detection on microcontroller hardware using a compact YOLO-based model called microYOLO. By downsampling input to 128x128, employing depthwise separable convolutions, and using a small SxS grid with a limited number of bounding boxes, the approach enables deployment on Cortex-M7 devices with memory under 800 KB Flash and 350 KB RAM, achieving around 3.5 FPS on the OpenMV H7 R2. The model is trained with pruning and post-training 8-bit quantization and evaluated on three tasks (fridge groceries, humans, and vehicles), revealing varying mAP performance (highest on the fridge task) and providing detailed error analysis. Deployment results and a dedicated C-code pipeline demonstrate practical edge-AI viability on microcontrollers, while highlighting trade-offs between FPS, memory, and detection accuracy for future improvements.
Abstract
This work-in-progress paper presents results on the feasibility of single-shot object detection on microcontrollers using YOLO. Single-shot object detectors like YOLO are widely used, however due to their complexity mainly on larger GPU-based platforms. We present microYOLO, which can be used on Cortex-M based microcontrollers, such as the OpenMV H7 R2, achieving about 3.5 FPS when classifying 128x128 RGB images while using less than 800 KB Flash and less than 350 KB RAM. Furthermore, we share experimental results for three different object detection tasks, analyzing the accuracy of microYOLO on them.
