Table of Contents
Fetching ...

Fusing uncalibrated IMUs and handheld smartphone video to reconstruct knee kinematics

J. D. Peiffer, Kunal Shah, Shawana Anarwala, Kayan Abdou, R. James Cotton

TL;DR

This work employs an implicit function to combine handheld smartphone video and uncalibrated IMU data at their full temporal resolution and validate this method in a diverse group including individuals with no gait impairments, lower limb prosthesis users, and those with a history of stroke.

Abstract

Video and wearable sensor data provide complementary information about human movement. Video provides a holistic understanding of the entire body in the world while wearable sensors provide high-resolution measurements of specific body segments. A robust method to fuse these modalities and obtain biomechanically accurate kinematics would have substantial utility for clinical assessment and monitoring. While multiple video-sensor fusion methods exist, most assume that a time-intensive, and often brittle, sensor-body calibration process has already been performed. In this work, we present a method to combine handheld smartphone video and uncalibrated wearable sensor data at their full temporal resolution. Our monocular, video-only, biomechanical reconstruction already performs well, with only several degrees of error at the knee during walking compared to markerless motion capture. Reconstructing from a fusion of video and wearable sensor data further reduces this error. We validate this in a mixture of people with no gait impairments, lower limb prosthesis users, and individuals with a history of stroke. We also show that sensor data allows tracking through periods of visual occlusion.

Fusing uncalibrated IMUs and handheld smartphone video to reconstruct knee kinematics

TL;DR

This work employs an implicit function to combine handheld smartphone video and uncalibrated IMU data at their full temporal resolution and validate this method in a diverse group including individuals with no gait impairments, lower limb prosthesis users, and those with a history of stroke.

Abstract

Video and wearable sensor data provide complementary information about human movement. Video provides a holistic understanding of the entire body in the world while wearable sensors provide high-resolution measurements of specific body segments. A robust method to fuse these modalities and obtain biomechanically accurate kinematics would have substantial utility for clinical assessment and monitoring. While multiple video-sensor fusion methods exist, most assume that a time-intensive, and often brittle, sensor-body calibration process has already been performed. In this work, we present a method to combine handheld smartphone video and uncalibrated wearable sensor data at their full temporal resolution. Our monocular, video-only, biomechanical reconstruction already performs well, with only several degrees of error at the knee during walking compared to markerless motion capture. Reconstructing from a fusion of video and wearable sensor data further reduces this error. We validate this in a mixture of people with no gait impairments, lower limb prosthesis users, and individuals with a history of stroke. We also show that sensor data allows tracking through periods of visual occlusion.
Paper Structure (25 sections, 11 equations, 4 figures, 2 tables)

This paper contains 25 sections, 11 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Using video from a handheld moving camera (A), our method disentangles camera and body orientation changes to accurately track a timed up and go test in global space (B,C). We validated the joint kinematics against a multi-camera system (D), finding close agreement between the two methods.
  • Figure 2: Our method jointly optimizes an implicit function learning the trajectory of an individual recording and $\vec{\beta}$ the scaling parameters of a biomechanical model. The implicit function takes time as an input and outputs $\vec{\theta}(t)$, the joint pose parameters, and $\hat{R}_{nc}(t)$ the orientation of the camera in a global frame. The pose and scaling parameters produce joint locations $\hat{p}_n(t)$ and joint orientations $\hat{R}_{nb}(t)$ that are compared to detected keypoints and sensor readings. Note the sensor calibrations are included in the optimziation. Changing the keypoint reference frame using $\hat{R}_{nc}$, as in Eqs. \ref{['eq:keypoint_loss']} and \ref{['eq:reprojection_loss']}, is not explicitly shown.
  • Figure 3: Example lower limb kinematics during walking. (A) With no occlusions, both video and fusion track closely with multi-camera estimates. In this example, the fusion solution tracks the knee better during heel strike. (B) The fusion solution continues to track knee kinematics during occlusion while the video does not.
  • Figure 4: Distribution of knee mean adjusted MAE (MAE-MA). Addition of sensors generally decreases MAE. Artificial occlusion of the sensorized leg increased MAE in video-only fits, yet fusion fits significantly decreased the error due to this. Fusion fits during occlusion were not significantly different from video fits.