Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation
Fabian C Weigend, Shubham Sonawani, Michael Drolet, Heni Ben Amor
TL;DR
This paper addresses arm pose estimation from a single smartwatch by learning to predict multimodal wrist and elbow postures and providing uncertainty in the predictions. It introduces a two-step calibration procedure and explores multiple rotation/position representations within two neural architectures (feedforward and recurrent), coupled with dropout-based posterior sampling to capture multimodality. The approach achieves about a 40% reduction in prediction error over prior work, with median wrist and elbow errors around 2.33 cm and 1.61 cm, respectively, and enables real-time operation at tens of Hz. By integrating speech recognition, the smartwatch becomes a ubiquitous robot control interface suitable for intervention and policy-imitation tasks, demonstrated in two real-use cases and complemented by limited but important usability metrics.
Abstract
This work devises an optimized machine learning approach for human arm pose estimation from a single smartwatch. Our approach results in a distribution of possible wrist and elbow positions, which allows for a measure of uncertainty and the detection of multiple possible arm posture solutions, i.e., multimodal pose distributions. Combining estimated arm postures with speech recognition, we turn the smartwatch into a ubiquitous, low-cost and versatile robot control interface. We demonstrate in two use-cases that this intuitive control interface enables users to swiftly intervene in robot behavior, to temporarily adjust their goal, or to train completely new control policies by imitation. Extensive experiments show that the approach results in a 40% reduction in prediction error over the current state-of-the-art and achieves a mean error of 2.56cm for wrist and elbow positions. The code is available at https://github.com/wearable-motion-capture.
