Table of Contents
Fetching ...

Non-verbal Hands-free Control for Smart Glasses using Teeth Clicks

Payal Mohapatra, Ali Aroudi, Anurag Kumar, Morteza Khaleghimeybodi

TL;DR

This paper addresses discreet, hands-free control for smart glasses by detecting two teeth-click patterns from nose-pad accelerometer signals. It introduces STEALTHsense, a lightweight temporal-broadcasting neural network with approximately $88\mathrm{K}$ parameters and $7.14\mathrm{M}$ MMAC, trained on a dataset of 21 participants to achieve a cross-subject balanced accuracy of $0.93$ under noisy conditions. The approach uses a tailored data augmentation, a 41-feature time-frequency representation, and a robust on-device inference pipeline, demonstrating strong performance and real-time feasibility. Field tests report positive user adoption and perceived accuracy, highlighting the practical potential for unobtrusive interaction in AR glasses. Overall, the work delivers a compact, noise-robust solution for non-verbal human-computer interaction with smart glasses and outlines clear paths for personalization and broader usability.

Abstract

Smart glasses are emerging as a popular wearable computing platform potentially revolutionizing the next generation of human-computer interaction. The widespread adoption of smart glasses has created a pressing need for discreet and hands-free control methods. Traditional input techniques, such as voice commands or tactile gestures, can be intrusive and non-discreet. Additionally, voice-based control may not function well in noisy acoustic conditions. We propose a novel, discreet, non-verbal, and non-tactile approach to controlling smart glasses through subtle vibrations on the skin induced by teeth clicking. We demonstrate that these vibrations can be sensed by accelerometers embedded in the glasses with a low-footprint predictive model. Our proposed method, called STEALTHsense, utilizes a temporal broadcasting-based neural network architecture with just 88K trainable parameters and 7.14M Multiply and Accumulate (MMAC) per inference unit. We benchmark our proposed STEALTHsense against state-of-the-art deep learning approaches and traditional low-footprint machine learning approaches. We conducted a study across 21 participants to collect representative samples for two distinct teeth-clicking patterns and many non-patterns for robust training of STEALTHsense, achieving an average cross-person accuracy of 0.93. Field testing confirmed its effectiveness, even in noisy conditions, underscoring STEALTHsense's potential for real-world applications, offering a promising solution for smart glasses interaction.

Non-verbal Hands-free Control for Smart Glasses using Teeth Clicks

TL;DR

This paper addresses discreet, hands-free control for smart glasses by detecting two teeth-click patterns from nose-pad accelerometer signals. It introduces STEALTHsense, a lightweight temporal-broadcasting neural network with approximately parameters and MMAC, trained on a dataset of 21 participants to achieve a cross-subject balanced accuracy of under noisy conditions. The approach uses a tailored data augmentation, a 41-feature time-frequency representation, and a robust on-device inference pipeline, demonstrating strong performance and real-time feasibility. Field tests report positive user adoption and perceived accuracy, highlighting the practical potential for unobtrusive interaction in AR glasses. Overall, the work delivers a compact, noise-robust solution for non-verbal human-computer interaction with smart glasses and outlines clear paths for personalization and broader usability.

Abstract

Smart glasses are emerging as a popular wearable computing platform potentially revolutionizing the next generation of human-computer interaction. The widespread adoption of smart glasses has created a pressing need for discreet and hands-free control methods. Traditional input techniques, such as voice commands or tactile gestures, can be intrusive and non-discreet. Additionally, voice-based control may not function well in noisy acoustic conditions. We propose a novel, discreet, non-verbal, and non-tactile approach to controlling smart glasses through subtle vibrations on the skin induced by teeth clicking. We demonstrate that these vibrations can be sensed by accelerometers embedded in the glasses with a low-footprint predictive model. Our proposed method, called STEALTHsense, utilizes a temporal broadcasting-based neural network architecture with just 88K trainable parameters and 7.14M Multiply and Accumulate (MMAC) per inference unit. We benchmark our proposed STEALTHsense against state-of-the-art deep learning approaches and traditional low-footprint machine learning approaches. We conducted a study across 21 participants to collect representative samples for two distinct teeth-clicking patterns and many non-patterns for robust training of STEALTHsense, achieving an average cross-person accuracy of 0.93. Field testing confirmed its effectiveness, even in noisy conditions, underscoring STEALTHsense's potential for real-world applications, offering a promising solution for smart glasses interaction.
Paper Structure (19 sections, 4 equations, 11 figures, 2 tables)

This paper contains 19 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: STEALTHsense leverages the accelerometers embedded on the nose pads of the smart glasses to pick up non-vocal discreet teeth-clicking gestures for a seamless control interface using a lightweight real-time pattern recognition pipeline.
  • Figure 2: Illustration of (a) an ideal template for single teeth click (pattern 1), (b) an ideal template for double clicks (pattern 2) and the corresponding (c) non-ideal pattern instance for single teeth click and (d) a non-ideal patterns instance for double teeth click due to variation in dental anatomy.
  • Figure 3: Motivating example to illustrate the simultaneous response for a single teeth click captured by an on-device nose-pad accelerometer and an acoustic microphone. Such nuanced discreet dental gestures can be picked up only through a nose-pad accelerometer modality.
  • Figure 4: Overall System Architecture illustrating I) segmentation of patterns using an annotator model, II) data augmentation using gain, time shifting, and additive noise from the pool of common noise using properties from the SNR study, III) feature engineering to combine spectral and temporal properties of the signal and IV) predictor network architecture for detection the event.
  • Figure 5: Pilot study on SNR characterization showing (a) original waveform captured in a sound room with a signal-plus-noise segment (blue highlight) and a noise segment (red highlight), (b) the trimmed segment corresponding to signal-plus-noise segment and (c) the noise segment. The duration for signal-plus-noise and noise segments are maintained to be equal. (d) SNR for two teeth-click pattern (single click and double click) for two participants.
  • ...and 6 more figures