Table of Contents
Fetching ...

Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation

Fabian C Weigend, Shubham Sonawani, Michael Drolet, Heni Ben Amor

TL;DR

This paper addresses arm pose estimation from a single smartwatch by learning to predict multimodal wrist and elbow postures and providing uncertainty in the predictions. It introduces a two-step calibration procedure and explores multiple rotation/position representations within two neural architectures (feedforward and recurrent), coupled with dropout-based posterior sampling to capture multimodality. The approach achieves about a 40% reduction in prediction error over prior work, with median wrist and elbow errors around 2.33 cm and 1.61 cm, respectively, and enables real-time operation at tens of Hz. By integrating speech recognition, the smartwatch becomes a ubiquitous robot control interface suitable for intervention and policy-imitation tasks, demonstrated in two real-use cases and complemented by limited but important usability metrics.

Abstract

This work devises an optimized machine learning approach for human arm pose estimation from a single smartwatch. Our approach results in a distribution of possible wrist and elbow positions, which allows for a measure of uncertainty and the detection of multiple possible arm posture solutions, i.e., multimodal pose distributions. Combining estimated arm postures with speech recognition, we turn the smartwatch into a ubiquitous, low-cost and versatile robot control interface. We demonstrate in two use-cases that this intuitive control interface enables users to swiftly intervene in robot behavior, to temporarily adjust their goal, or to train completely new control policies by imitation. Extensive experiments show that the approach results in a 40% reduction in prediction error over the current state-of-the-art and achieves a mean error of 2.56cm for wrist and elbow positions. The code is available at https://github.com/wearable-motion-capture.

Anytime, Anywhere: Human Arm Pose from Smartwatch Data for Ubiquitous Robot Control and Teleoperation

TL;DR

This paper addresses arm pose estimation from a single smartwatch by learning to predict multimodal wrist and elbow postures and providing uncertainty in the predictions. It introduces a two-step calibration procedure and explores multiple rotation/position representations within two neural architectures (feedforward and recurrent), coupled with dropout-based posterior sampling to capture multimodality. The approach achieves about a 40% reduction in prediction error over prior work, with median wrist and elbow errors around 2.33 cm and 1.61 cm, respectively, and enables real-time operation at tens of Hz. By integrating speech recognition, the smartwatch becomes a ubiquitous robot control interface suitable for intervention and policy-imitation tasks, demonstrated in two real-use cases and complemented by limited but important usability metrics.

Abstract

This work devises an optimized machine learning approach for human arm pose estimation from a single smartwatch. Our approach results in a distribution of possible wrist and elbow positions, which allows for a measure of uncertainty and the detection of multiple possible arm posture solutions, i.e., multimodal pose distributions. Combining estimated arm postures with speech recognition, we turn the smartwatch into a ubiquitous, low-cost and versatile robot control interface. We demonstrate in two use-cases that this intuitive control interface enables users to swiftly intervene in robot behavior, to temporarily adjust their goal, or to train completely new control policies by imitation. Extensive experiments show that the approach results in a 40% reduction in prediction error over the current state-of-the-art and achieves a mean error of 2.56cm for wrist and elbow positions. The code is available at https://github.com/wearable-motion-capture.
Paper Structure (16 sections, 3 equations, 10 figures, 1 table)

This paper contains 16 sections, 3 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Top: The avatar shows predicted elbow and wrist positions from smartwatch sensor data. Our approach results in a distribution of solutions. The mean of a distribution is depicted as a green sphere All individual predictions of a distribution are depicted as small cubes, colored according to their proximity to the mean. Bottom: We also stream microphone data to utilize speech recognition. This combination offers a versatile interface to interact with and to control robots anytime and anywhere.
  • Figure 2: Left: We collected ground truth data with an optical motion capture system and a 25-marker upper body suit. Right: Our two-step calibration process. First, the user holds the watch at chest height to estimate relative atmospheric pressure. Then, the user stretches the arm forward for an estimate of body orientation.
  • Figure 3: This figure depicts two examples for data before and after calibration. Each plot contains all of our 381 535 data points.
  • Figure 4: A comparison of prediction accuracy for combined wrist and elbow positions on test data. Both network architectures are trained to predict wrist and elbow positions in polar coordinates or Cartesian coordinates (XYZ) as well as upper and lower arm rotations as quaternions or .
  • Figure 5: Error histograms of wrist position predictions of three distinct combinations of network architecture and prediction targets.
  • ...and 5 more figures