Table of Contents
Fetching ...

Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, Ben Menzies, Gabriel Homewood, Kemi Jacobs, Paolo Baesso, David Trickett, Chris Mair, Taru Muhonen, Rory Clark, Louis Berridge, Richard Vigars, Iain Wallace

TL;DR

Helios delivers an ultra-low-power, on-device, event-based hand gesture recognition system for always-on smart eyewear by leveraging a compact 3mm×4mm/20mW event camera and a CNN running on a Nano UltraLite to achieve <350mW total power, 60 ms latency, and 91% accuracy across seven gestures. The system processes sparse event streams into time-surface representations and uses a two-stage CNN to first locate the hand and then classify microgestures, trained with a synthetic ESIM-generated dataset and validated with real user tests (n=20). Key innovations include time-surface representations with exponential weighting, a dual-loss training scheme combining bounding-box and gesture losses, and a low-latency inference pipeline suitable for eyewear form factors. The results demonstrate robust performance under ego-motion, enabling ergonomic, private, and responsive interaction for AR glasses, with scalable paths to additional gestures and further power optimizations.

Abstract

This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice controls, present limitations in ergonomics, privacy and power consumption. Helios addresses these challenges by leveraging natural hand interactions for a more intuitive and comfortable user experience. Our system utilizes a extremely low-power and compact 3mmx4mm/20mW event camera to perform natural hand-based gesture recognition for always-on smart eyewear. The camera's output is processed by a convolutional neural network (CNN) running on a NXP Nano UltraLite compute platform, consuming less than 350mW. Helios can recognize seven classes of gestures, including subtle microgestures like swipes and pinches, with 91% accuracy. We also demonstrate real-time performance across 20 users at a remarkably low latency of 60ms. Our user testing results align with the positive feedback we received during our recent successful demo at AWE-USA-2024.

Helios: An extremely low power event-based gesture recognition for always-on smart eyewear

TL;DR

Helios delivers an ultra-low-power, on-device, event-based hand gesture recognition system for always-on smart eyewear by leveraging a compact 3mm×4mm/20mW event camera and a CNN running on a Nano UltraLite to achieve <350mW total power, 60 ms latency, and 91% accuracy across seven gestures. The system processes sparse event streams into time-surface representations and uses a two-stage CNN to first locate the hand and then classify microgestures, trained with a synthetic ESIM-generated dataset and validated with real user tests (n=20). Key innovations include time-surface representations with exponential weighting, a dual-loss training scheme combining bounding-box and gesture losses, and a low-latency inference pipeline suitable for eyewear form factors. The results demonstrate robust performance under ego-motion, enabling ergonomic, private, and responsive interaction for AR glasses, with scalable paths to additional gestures and further power optimizations.

Abstract

This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice controls, present limitations in ergonomics, privacy and power consumption. Helios addresses these challenges by leveraging natural hand interactions for a more intuitive and comfortable user experience. Our system utilizes a extremely low-power and compact 3mmx4mm/20mW event camera to perform natural hand-based gesture recognition for always-on smart eyewear. The camera's output is processed by a convolutional neural network (CNN) running on a NXP Nano UltraLite compute platform, consuming less than 350mW. Helios can recognize seven classes of gestures, including subtle microgestures like swipes and pinches, with 91% accuracy. We also demonstrate real-time performance across 20 users at a remarkably low latency of 60ms. Our user testing results align with the positive feedback we received during our recent successful demo at AWE-USA-2024.
Paper Structure (34 sections, 1 equation, 10 figures, 2 tables)

This paper contains 34 sections, 1 equation, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Picture of using the setup: a user wearing smart glasses with event sensors, and controlling Spotify on the mobile device via natural hand gestures, instead of capacitive touch or voice.
  • Figure 2: Our hardware setup consisting of Prophesee GenX320 event camera fitted onto Meta Ray-Ban smart glasses connected to a NXP iMX 8M Nano UltraLite for processing events
  • Figure 3: In this image the camera module can be seen in the housing that is attached to the Meta Ray-Ban glasses. The camera is pointing down with a small tilt away from the body of the user
  • Figure 4: Our chosen microgestures for natural hand interactions with smart eyewear: thumb swipes and pinch.
  • Figure 5: High level block diagram of the gesture detection model architecture
  • ...and 5 more figures