Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO
Julian Moosmann, Pietro Bonazzi, Yawei Li, Sizhen Bian, Philipp Mayer, Luca Benini, Michele Magno
TL;DR
The paper demonstrates an energy-efficient, on-device object-detection pipeline for smart glasses by integrating GAP9 hardware with a family of sub-million-parameter TinyissimoYOLO networks. It achieves end-to-end latency of about 56 ms (≈18 FPS) and total power around 62.9 mW, supporting up to 9.3 hours of continuous operation on a 154 mAh battery, all while processing image capture, inference, and post-processing on-device. The TinyissimoYOLO variants are trained on 256×256 inputs, quantized to 8-bit, and deployed on GAP9’s NE16 accelerator to balance accuracy and resource use, delivering up to 80-class detection with sub-MB models. The work compares favorably to MCUNet and similar edge approaches, demonstrates comprehensive hardware/software integration, and provides open-source code to foster reproducibility and further development in ultra-low-power wearable AI.
Abstract
Smart glasses are rapidly gaining advanced functions thanks to cutting-edge computing technologies, especially accelerated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. However, integrating AI into smart glasses featuring a small form factor and limited battery capacity remains challenging for a satisfactory user experience. To this end, this paper proposes the design of a smart glasses platform for always-on on-device object detection with an all-day battery lifetime. The proposed platform is based on GAP9, a novel multi-core RISC-V processor from Greenwaves Technologies. Additionally, a family of sub-million parameter TinyissimoYOLO networks are proposed. They are benchmarked on established datasets, capable of differentiating up to 80 classes on MS-COCO. Evaluations on the smart glasses prototype demonstrate TinyissimoYOLO's inference latency of only 17ms and consuming 1.59mJ energy per inference. An end-to-end latency of 56ms is achieved which is equivalent to 18 frames per seconds (FPS) with a total power consumption of 62.9mW. This ensures continuous system runtime of up to 9.3 hours on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS, while the 18 FPS achieved in this paper even include image-capturing, network inference, and detection post-processing. The algorithm's code is released open with this paper and can be found here: https://github.com/ETH-PBL/TinyissimoYOLO
