Table of Contents
Fetching ...

WristSonic: Enabling Fine-grained Hand-Face Interactions on Smartwatches Using Active Acoustic Sensing

Saif Mahmud, Kian Mahmoodi, Chi-Jung Lee, Francois Guimbretiere, Cheng Zhang

TL;DR

WristSonic is proposed, a wrist-worn active acoustic sensing system that uses speakers and microphones to capture ultrasonic reflections from hand, arm, and face movements, enabling fine-grained detection of hand-face interactions with minimal intrusion.

Abstract

Hand-face interactions play a key role in many everyday tasks, providing insights into user habits, behaviors, intentions, and expressions. However, existing wearable sensing systems often struggle to track these interactions in daily settings due to their reliance on multiple sensors or privacy-sensitive, vision-based approaches. To address these challenges, we propose WristSonic, a wrist-worn active acoustic sensing system that uses speakers and microphones to capture ultrasonic reflections from hand, arm, and face movements, enabling fine-grained detection of hand-face interactions with minimal intrusion. By transmitting and analyzing ultrasonic waves, WristSonic distinguishes a wide range of gestures, such as tapping the temple, brushing teeth, and nodding, using a Transformer-based neural network architecture. This approach achieves robust recognition of 21 distinct actions with a single, low-power, privacy-conscious wearable. Through two user studies with 15 participants in controlled and semi-in-the-wild settings, WristSonic demonstrates high efficacy, achieving macro F1-scores of 93.08% and 82.65%, respectively.

WristSonic: Enabling Fine-grained Hand-Face Interactions on Smartwatches Using Active Acoustic Sensing

TL;DR

WristSonic is proposed, a wrist-worn active acoustic sensing system that uses speakers and microphones to capture ultrasonic reflections from hand, arm, and face movements, enabling fine-grained detection of hand-face interactions with minimal intrusion.

Abstract

Hand-face interactions play a key role in many everyday tasks, providing insights into user habits, behaviors, intentions, and expressions. However, existing wearable sensing systems often struggle to track these interactions in daily settings due to their reliance on multiple sensors or privacy-sensitive, vision-based approaches. To address these challenges, we propose WristSonic, a wrist-worn active acoustic sensing system that uses speakers and microphones to capture ultrasonic reflections from hand, arm, and face movements, enabling fine-grained detection of hand-face interactions with minimal intrusion. By transmitting and analyzing ultrasonic waves, WristSonic distinguishes a wide range of gestures, such as tapping the temple, brushing teeth, and nodding, using a Transformer-based neural network architecture. This approach achieves robust recognition of 21 distinct actions with a single, low-power, privacy-conscious wearable. Through two user studies with 15 participants in controlled and semi-in-the-wild settings, WristSonic demonstrates high efficacy, achieving macro F1-scores of 93.08% and 82.65%, respectively.

Paper Structure

This paper contains 35 sections, 9 figures, 1 table.

Figures (9)

  • Figure 2: Echo profiles of selected activities over 2.0 seconds of tracking. The top image shows the activity, while the bottom displays the generated echo profile. Blue regions represent areas tracked by the outward-facing sensor (relative to the arm), while orange regions represent those tracked by the inward-facing sensor. The top echo profile corresponds to the outward-facing sensor and the bottom profile to the inward-facing sensor.
  • Figure 3: Hardware of WristSonic. (a) and (b) show top-down and side views of WristSonic's form factor, with the outward-facing sensor in blue and the inward-facing sensor in orange. (c) Knuckle-up view, highlighting the outward-facing sensor. (d) Palm-up view, highlighting the inward-facing sensor.
  • Figure 4: Deep learning model architecture for WristSonic.
  • Figure 5: Set of gestures, activities, and head motions tracked by WristSonic.
  • Figure 6: Normalized confusion matrix of gestures in leave-one-participant-out user evaluation of the in-lab study.
  • ...and 4 more figures