Table of Contents
Fetching ...

EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband

Chi-Jung Lee, Ruidong Zhang, Devansh Agarwal, Tianhong Catherine Yu, Vipin Gunda, Oliver Lopez, James Kim, Sicheng Yin, Boao Dong, Ke Li, Mose Sakashita, Francois Guimbretiere, Cheng Zhang

TL;DR

EchoWrist delivers the first wristband capable of continuous 3D hand pose tracking and hand-object interaction recognition using active acoustic sensing. The system employs two speaker–microphone pairs mounted on the wrist, FMCW echo profiling, and a CNN-based inference pipeline to reconstruct 20 hand joints and classify 12 interactions at about 57.9 mW. In two user studies with 24 participants, EchoWrist achieves a mean 3D pose error of 4.81 mm and an interaction recognition accuracy of 97.6%, while supporting real-time smartphone-based inference. The work demonstrates a minimally obtrusive, privacy-conscious wearable approach that enables full-day use on standard smartwatches and opens pathways for integrated, ambient HCI on wearable devices.

Abstract

Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible sound waves toward the hand. These sound waves interact with the hand and its surroundings through reflections and diffractions, carrying rich information about the hand's shape and the objects it interacts with. The information captured by the two microphones goes through a deep learning inference system that recovers hand poses and identifies various everyday hand activities. Results from the two 12-participant user studies show that EchoWrist is effective and efficient at tracking 3D hand poses and recognizing hand-object interactions. Operating at 57.9mW, EchoWrist is able to continuously reconstruct 20 3D hand joints with MJEDE of 4.81mm and recognize 12 naturalistic hand-object interactions with 97.6% accuracy.

EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband

TL;DR

EchoWrist delivers the first wristband capable of continuous 3D hand pose tracking and hand-object interaction recognition using active acoustic sensing. The system employs two speaker–microphone pairs mounted on the wrist, FMCW echo profiling, and a CNN-based inference pipeline to reconstruct 20 hand joints and classify 12 interactions at about 57.9 mW. In two user studies with 24 participants, EchoWrist achieves a mean 3D pose error of 4.81 mm and an interaction recognition accuracy of 97.6%, while supporting real-time smartphone-based inference. The work demonstrates a minimally obtrusive, privacy-conscious wearable approach that enables full-day use on standard smartwatches and opens pathways for integrated, ambient HCI on wearable devices.

Abstract

Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible sound waves toward the hand. These sound waves interact with the hand and its surroundings through reflections and diffractions, carrying rich information about the hand's shape and the objects it interacts with. The information captured by the two microphones goes through a deep learning inference system that recovers hand poses and identifies various everyday hand activities. Results from the two 12-participant user studies show that EchoWrist is effective and efficient at tracking 3D hand poses and recognizing hand-object interactions. Operating at 57.9mW, EchoWrist is able to continuously reconstruct 20 3D hand joints with MJEDE of 4.81mm and recognize 12 naturalistic hand-object interactions with 97.6% accuracy.
Paper Structure (61 sections, 14 figures, 3 tables)

This paper contains 61 sections, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Pilot studies were conducted with (a) an experimental prototype. (b) An ultrasonic transducer was used for the initial design, while (c) a commodity speaker was chosen for the later prototype.
  • Figure 2: We explored the performance of (a) each finger in reconstruction mean joint Euclidean distance error (MJEDE) when the speaker was placed at different positions. Note that the numbers on the x-axis represent positions evenly distributed around the wrist. The center of the palm side is represented as 0, the one to its right as 1, and so forth. System MJEDE while adjusting (b) the height of the sensors, (c) the transmitted signal on the speakers, and (d) the combination of speaker and microphone were also examined. The number and position of the speakers and microphones can be referred to Figure \ref{['fig:design-hardware']} (a).
  • Figure 3: Hardware of EchoWrist. (a) (b) Wearing EchoWrist at the wrist. All components are mounted on a silicone wristband. (c) Customized PCBs for the microcontroller module and the sensing module. (1): US Quater Dollar coin. (2): 3.7V 70mAh LiPo battery. (3): Customized PCB with SGW1110 module. (4) & (5): Sensor module with speaker and microphone.
  • Figure 4: 3D hand pose ground truth annotation. (a) 21 joints detected by MediaPipe. (b) Important vectors used during ground truth normalization. $\vec{v_9}$ is the wrist-to-palm vector used to represent the orientation of the hand. The plan defined by $\vec{v_5}$ and $\vec{v_{17}}$ is used to align the detected hand with the reference hand. The actual length of $\vec{v_{17}}$ is measured in the image against the size of the sensing module to uniform the hand size in each frame.
  • Figure 5: The model architecture of EchoWrist.
  • ...and 9 more figures