Helios: An extremely low power event-based gesture recognition for always-on smart eyewear
Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, Ben Menzies, Gabriel Homewood, Kemi Jacobs, Paolo Baesso, David Trickett, Chris Mair, Taru Muhonen, Rory Clark, Louis Berridge, Richard Vigars, Iain Wallace
TL;DR
Helios delivers an ultra-low-power, on-device, event-based hand gesture recognition system for always-on smart eyewear by leveraging a compact 3mm×4mm/20mW event camera and a CNN running on a Nano UltraLite to achieve <350mW total power, 60 ms latency, and 91% accuracy across seven gestures. The system processes sparse event streams into time-surface representations and uses a two-stage CNN to first locate the hand and then classify microgestures, trained with a synthetic ESIM-generated dataset and validated with real user tests (n=20). Key innovations include time-surface representations with exponential weighting, a dual-loss training scheme combining bounding-box and gesture losses, and a low-latency inference pipeline suitable for eyewear form factors. The results demonstrate robust performance under ego-motion, enabling ergonomic, private, and responsive interaction for AR glasses, with scalable paths to additional gestures and further power optimizations.
Abstract
This paper introduces Helios, the first extremely low-power, real-time, event-based hand gesture recognition system designed for all-day on smart eyewear. As augmented reality (AR) evolves, current smart glasses like the Meta Ray-Bans prioritize visual and wearable comfort at the expense of functionality. Existing human-machine interfaces (HMIs) in these devices, such as capacitive touch and voice controls, present limitations in ergonomics, privacy and power consumption. Helios addresses these challenges by leveraging natural hand interactions for a more intuitive and comfortable user experience. Our system utilizes a extremely low-power and compact 3mmx4mm/20mW event camera to perform natural hand-based gesture recognition for always-on smart eyewear. The camera's output is processed by a convolutional neural network (CNN) running on a NXP Nano UltraLite compute platform, consuming less than 350mW. Helios can recognize seven classes of gestures, including subtle microgestures like swipes and pinches, with 91% accuracy. We also demonstrate real-time performance across 20 users at a remarkably low latency of 60ms. Our user testing results align with the positive feedback we received during our recent successful demo at AWE-USA-2024.
