Table of Contents
Fetching ...

Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-Videos

Tobias Leuthold, Michele Xiloyannis, Yves Zimmermann

TL;DR

The paper addresses the challenge of accurate physiotherapy-relevant posture estimation from monocular video by introducing a physics-informed backend that fuses BlazePose's 2D and 3D signals. A Kalman-filtered bone-length model together with biomechanical priors enforces anatomical plausibility, while a multicomponent cost function guides a frame-by-frame optimization run on backend servers. Empirical results on the Physio2.2M dataset show 10.2% improvement in 3D MPJPE, 12.1% improvement in XY MPJPE, 16.6% reduction in body-angle errors, and a 94.3% decrease in bone-length variance, achieving physiotherapist-like accuracy with privacy protection. The approach demonstrates strong potential for automated coaching, remote physiotherapy, and sports coaching on consumer hardware, without exposing raw video data.

Abstract

Applications providing automated coaching for physical training are increasing in popularity, for example physical therapy. These applications rely on accurate and robust pose estimation using monocular video streams. State-of-the-art models like BlazePose excel in real-time pose tracking, but their lack of anatomical constraints indicates improvement potential by including physical knowledge. We present a real-time post-processing algorithm fusing the strengths of BlazePose 3D and 2D estimations using a weighted optimization, penalizing deviations from expected bone length and biomechanical models. Bone length estimations are refined to the individual anatomy using a Kalman filter with adapting measurement trust. Evaluation using the Physio2.2M dataset shows a 10.2 percent reduction in 3D MPJPE and a 16.6 percent decrease in errors of angles between body segments compared to BlazePose 3D estimation. Our method provides a robust, anatomically consistent pose estimation based on a computationally efficient video-to-3D pose estimation, suitable for automated physiotherapy, healthcare, and sports coaching on consumer-level laptops and mobile devices. The refinement runs on the backend with anonymized data only.

Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-Videos

TL;DR

The paper addresses the challenge of accurate physiotherapy-relevant posture estimation from monocular video by introducing a physics-informed backend that fuses BlazePose's 2D and 3D signals. A Kalman-filtered bone-length model together with biomechanical priors enforces anatomical plausibility, while a multicomponent cost function guides a frame-by-frame optimization run on backend servers. Empirical results on the Physio2.2M dataset show 10.2% improvement in 3D MPJPE, 12.1% improvement in XY MPJPE, 16.6% reduction in body-angle errors, and a 94.3% decrease in bone-length variance, achieving physiotherapist-like accuracy with privacy protection. The approach demonstrates strong potential for automated coaching, remote physiotherapy, and sports coaching on consumer hardware, without exposing raw video data.

Abstract

Applications providing automated coaching for physical training are increasing in popularity, for example physical therapy. These applications rely on accurate and robust pose estimation using monocular video streams. State-of-the-art models like BlazePose excel in real-time pose tracking, but their lack of anatomical constraints indicates improvement potential by including physical knowledge. We present a real-time post-processing algorithm fusing the strengths of BlazePose 3D and 2D estimations using a weighted optimization, penalizing deviations from expected bone length and biomechanical models. Bone length estimations are refined to the individual anatomy using a Kalman filter with adapting measurement trust. Evaluation using the Physio2.2M dataset shows a 10.2 percent reduction in 3D MPJPE and a 16.6 percent decrease in errors of angles between body segments compared to BlazePose 3D estimation. Our method provides a robust, anatomically consistent pose estimation based on a computationally efficient video-to-3D pose estimation, suitable for automated physiotherapy, healthcare, and sports coaching on consumer-level laptops and mobile devices. The refinement runs on the backend with anonymized data only.

Paper Structure

This paper contains 35 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Pipeline Overview. The arrows represent information flow, while the boxes represent modules in the pipeline responsible for processing inputs and generating outputs. Blue is used for the parts that run on the user device, while green represents the parts that run on our backend. All video data stays on the user's device.
  • Figure 2: The left plot shows the average MPJPE in 3D and the XY plane (accuracy). The right plot displays the standard deviation of MPJPE (precision).
  • Figure 3: Comparison of average angle error and standard deviation for all body angles.
  • Figure 4: Average angle error by exercise.
  • Figure 5: Variance Comparision of Optimized and BlazePose World Coordinates for all bones.