WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

Jiwan Kim; Chi-Jung Lee; Hohurn Jung; Tianhong Catherine Yu; Ruidong Zhang; Ian Oakley; Cheng Zhang

WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

Jiwan Kim, Chi-Jung Lee, Hohurn Jung, Tianhong Catherine Yu, Ruidong Zhang, Ian Oakley, Cheng Zhang

TL;DR

WatchHand is presented, the first continuous 3D hand pose tracking system implemented on off-the-shelf smartwatches using only their built-in speaker and microphone, and lowers the barrier to smartwatch-based hand tracking by eliminating additional hardware while enabling robust, always-available interactions on millions of existing devices.

Abstract

Tracking hand poses on wrist-wearables enables rich, expressive interactions, yet remains unavailable on commercial smartwatches, as prior implementations rely on external sensors or custom hardware, limiting their real-world applicability. To address this, we present WatchHand, the first continuous 3D hand pose tracking system implemented on off-the-shelf smartwatches using only their built-in speaker and microphone. WatchHand emits inaudible frequency-modulated continuous waves and captures their reflections from the hand. These acoustic signals are processed by a deep-learning model that estimates 3D hand poses for 20 finger joints. We evaluate WatchHand across diverse real-world conditions -- multiple smartwatch models, wearing-hands, body postures, noise conditions, pose-variation protocols -- and achieve a mean per-joint position error of 7.87 mm in cross-session tests with device remounting. Although performance drops for unseen users or gestures, the model adapts effectively with lightweight fine-tuning on small amounts of data. Overall, WatchHand lowers the barrier to smartwatch-based hand tracking by eliminating additional hardware while enabling robust, always-available interactions on millions of existing devices.

WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

TL;DR

Abstract

Paper Structure (63 sections, 1 equation, 12 figures, 1 table)

This paper contains 63 sections, 1 equation, 12 figures, 1 table.

Introduction
Related Work
Active Acoustic Sensing
Hand Pose Input on Commercial Smartwatches
Continuous Hand Pose Tracking on Wrist Wearables
System Design and Implementation
Design Consideration and Principle
Data Collection System
Acoustic Data Preprocessing
C-FMCW Based Echo Profiles
Differential Echo Profiles
Echo Profile Calibration for Different Watches
IMU Motion Data Preprocessing
Preliminary Viability Testing on COTS Watches
Impact of hardware and watch-wearing hand
...and 48 more sections

Figures (12)

Figure 1: (A) Three different COTS smartwatches we evaluated and the physical locations of their built-in speaker and microphone, and (B) system evaluation setup: A smartwatch is worn on a prop hand while a stepper motor–driven linear stage moves a flat plate back and forth within typical finger movement ranges (10–15 cm). To evaluate different angles, the prop hand was rotated to various orientations (e.g., 0°, ±30°, ±60°, ±90°, and perpendicular) relative to the direction of plate motion.
Figure 2: Visual examples of echo profile calibration across three commercial smartwatches. Sliding-window cross-correlation peak correction removes peak misalignment (red triangles), while periodic drift calibration mitigates repeating noise artifacts (red dashed boxes). Hand pose transition timings are marked (yellow triangles). Together, these calibration steps produce cleaner and more temporally stable echo profiles during dynamic hand pose transitions.
Figure 3: Captured original and differential echo profiles across different COTS smartwatches (Galaxy, Xiaomi, and Pixel) (A) and varying angles between the plate's motion and line toward the hand (B) during repetitive back-and-forth movements toward and away from the smartwatch.
Figure 4: Processed original and differential echo profiles (see Section \ref{['acoustic-preprocessing']}) and bandpass-filtered IMU sensor data in the 32-100 Hz range (see Section \ref{['imu-preprocessing']}) captured from a single user using a COTS smartwatch (Galaxy Watch 7) during different hand postures.
Figure 5: (A) MediapPipe-based 3D hand joint annotation used as ground truth, showing 20 labeled landmarks from the root to each fingertip. (B) Study setup in which the laptop’s front camera faces the participant’s palm to capture ground truth without optical occlusion during hand pose demonstrations.
...and 7 more figures

WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

TL;DR

Abstract

WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches

Authors

TL;DR

Abstract

Table of Contents

Figures (12)