Table of Contents
Fetching ...

Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO

Julian Moosmann, Pietro Bonazzi, Yawei Li, Sizhen Bian, Philipp Mayer, Luca Benini, Michele Magno

TL;DR

The paper demonstrates an energy-efficient, on-device object-detection pipeline for smart glasses by integrating GAP9 hardware with a family of sub-million-parameter TinyissimoYOLO networks. It achieves end-to-end latency of about 56 ms (≈18 FPS) and total power around 62.9 mW, supporting up to 9.3 hours of continuous operation on a 154 mAh battery, all while processing image capture, inference, and post-processing on-device. The TinyissimoYOLO variants are trained on 256×256 inputs, quantized to 8-bit, and deployed on GAP9’s NE16 accelerator to balance accuracy and resource use, delivering up to 80-class detection with sub-MB models. The work compares favorably to MCUNet and similar edge approaches, demonstrates comprehensive hardware/software integration, and provides open-source code to foster reproducibility and further development in ultra-low-power wearable AI.

Abstract

Smart glasses are rapidly gaining advanced functions thanks to cutting-edge computing technologies, especially accelerated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. However, integrating AI into smart glasses featuring a small form factor and limited battery capacity remains challenging for a satisfactory user experience. To this end, this paper proposes the design of a smart glasses platform for always-on on-device object detection with an all-day battery lifetime. The proposed platform is based on GAP9, a novel multi-core RISC-V processor from Greenwaves Technologies. Additionally, a family of sub-million parameter TinyissimoYOLO networks are proposed. They are benchmarked on established datasets, capable of differentiating up to 80 classes on MS-COCO. Evaluations on the smart glasses prototype demonstrate TinyissimoYOLO's inference latency of only 17ms and consuming 1.59mJ energy per inference. An end-to-end latency of 56ms is achieved which is equivalent to 18 frames per seconds (FPS) with a total power consumption of 62.9mW. This ensures continuous system runtime of up to 9.3 hours on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS, while the 18 FPS achieved in this paper even include image-capturing, network inference, and detection post-processing. The algorithm's code is released open with this paper and can be found here: https://github.com/ETH-PBL/TinyissimoYOLO

Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO

TL;DR

The paper demonstrates an energy-efficient, on-device object-detection pipeline for smart glasses by integrating GAP9 hardware with a family of sub-million-parameter TinyissimoYOLO networks. It achieves end-to-end latency of about 56 ms (≈18 FPS) and total power around 62.9 mW, supporting up to 9.3 hours of continuous operation on a 154 mAh battery, all while processing image capture, inference, and post-processing on-device. The TinyissimoYOLO variants are trained on 256×256 inputs, quantized to 8-bit, and deployed on GAP9’s NE16 accelerator to balance accuracy and resource use, delivering up to 80-class detection with sub-MB models. The work compares favorably to MCUNet and similar edge approaches, demonstrates comprehensive hardware/software integration, and provides open-source code to foster reproducibility and further development in ultra-low-power wearable AI.

Abstract

Smart glasses are rapidly gaining advanced functions thanks to cutting-edge computing technologies, especially accelerated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. However, integrating AI into smart glasses featuring a small form factor and limited battery capacity remains challenging for a satisfactory user experience. To this end, this paper proposes the design of a smart glasses platform for always-on on-device object detection with an all-day battery lifetime. The proposed platform is based on GAP9, a novel multi-core RISC-V processor from Greenwaves Technologies. Additionally, a family of sub-million parameter TinyissimoYOLO networks are proposed. They are benchmarked on established datasets, capable of differentiating up to 80 classes on MS-COCO. Evaluations on the smart glasses prototype demonstrate TinyissimoYOLO's inference latency of only 17ms and consuming 1.59mJ energy per inference. An end-to-end latency of 56ms is achieved which is equivalent to 18 frames per seconds (FPS) with a total power consumption of 62.9mW. This ensures continuous system runtime of up to 9.3 hours on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS, while the 18 FPS achieved in this paper even include image-capturing, network inference, and detection post-processing. The algorithm's code is released open with this paper and can be found here: https://github.com/ETH-PBL/TinyissimoYOLO
Paper Structure (17 sections, 5 figures, 3 tables)

This paper contains 17 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The designed smart glasses hardware, which retrofits commercial temples of smart glasses.
  • Figure 2: Development Board: a) The proposed hardware system consists of two boards. The board on the left---development board---is shown, featuring additional power circuitry, multiple camera interfaces, Wi-Fi, and debug possibility. The smart glasses board---zoomed in---, features two , several sensors, and a power management system for stand-alone operation. b) shows the hardware block diagram for the development and smart glasses board respectively.
  • Figure 3: Full System Power Measurement: The lines described in the legend show if the mentioned system process is running or not running. We show 18fps GAP9 on-device image-capturing, demosaicing, network inference and postprocessing execution.
  • Figure 4: Evaluation of the Deployed TinyissimoYOLO Versions on GAP9
  • Figure 5: End-to-End System Overview: The image shows the flow chart of the demonstrator firmware, including the execution latency for the corresponding task. The box sizes are in relative size to the execution time.