Table of Contents
Fetching ...

OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video

Selim Gilon, Emily Y. Miller, Scott D. Uhlrich

Abstract

Quantifying human movement (kinematics) and musculoskeletal forces (kinetics) at scale, such as estimating quadriceps force during a sit-to-stand movement, could transform prediction, treatment, and monitoring of mobility-related conditions. However, quantifying kinematics and kinetics traditionally requires costly, time-intensive analysis in specialized laboratories, limiting clinical translation. Scalable, accurate tools for biomechanical assessment are needed. We introduce OpenCap Monocular, an algorithm that estimates 3D skeletal kinematics and kinetics from a single smartphone video. The method refines 3D human pose estimates from a monocular pose estimation model (WHAM) via optimization, computes kinematics of a biomechanically constrained skeletal model, and estimates kinetics via physics-based simulation and machine learning. We validated OpenCap Monocular against marker-based motion capture and force plate data for walking, squatting, and sit-to-stand tasks. OpenCap Monocular achieved low kinematic error (4.8° mean absolute error for rotational degrees of freedom; 3.4 cm for pelvis translations), outperforming a regression-only computer vision baseline by 48% in rotational accuracy (p = 0.036) and 69% in translational accuracy (p < 0.001). OpenCap Monocular also estimated ground reaction forces during walking with accuracy comparable to, or better than, our prior two-camera OpenCap system. We demonstrate that the algorithm estimates important kinetic outcomes with clinically meaningful accuracy in applications related to frailty and knee osteoarthritis, including estimating knee extension moment during sit-to-stand transitions and knee adduction moment during walking. OpenCap Monocular is deployed via a smartphone app, web app, and secure cloud computing (https://opencap.ai), enabling free, accessible single-smartphone biomechanical assessments.

OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video

Abstract

Quantifying human movement (kinematics) and musculoskeletal forces (kinetics) at scale, such as estimating quadriceps force during a sit-to-stand movement, could transform prediction, treatment, and monitoring of mobility-related conditions. However, quantifying kinematics and kinetics traditionally requires costly, time-intensive analysis in specialized laboratories, limiting clinical translation. Scalable, accurate tools for biomechanical assessment are needed. We introduce OpenCap Monocular, an algorithm that estimates 3D skeletal kinematics and kinetics from a single smartphone video. The method refines 3D human pose estimates from a monocular pose estimation model (WHAM) via optimization, computes kinematics of a biomechanically constrained skeletal model, and estimates kinetics via physics-based simulation and machine learning. We validated OpenCap Monocular against marker-based motion capture and force plate data for walking, squatting, and sit-to-stand tasks. OpenCap Monocular achieved low kinematic error (4.8° mean absolute error for rotational degrees of freedom; 3.4 cm for pelvis translations), outperforming a regression-only computer vision baseline by 48% in rotational accuracy (p = 0.036) and 69% in translational accuracy (p < 0.001). OpenCap Monocular also estimated ground reaction forces during walking with accuracy comparable to, or better than, our prior two-camera OpenCap system. We demonstrate that the algorithm estimates important kinetic outcomes with clinically meaningful accuracy in applications related to frailty and knee osteoarthritis, including estimating knee extension moment during sit-to-stand transitions and knee adduction moment during walking. OpenCap Monocular is deployed via a smartphone app, web app, and secure cloud computing (https://opencap.ai), enabling free, accessible single-smartphone biomechanical assessments.

Paper Structure

This paper contains 24 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: OpenCap Monocular Enables Scalable Evaluation of 3D Human Motion and Musculoskeletal Dynamics. (A.) Traditional lab-based motion capture provides valuable, high-fidelity biomechanical assessments, but it is costly and time-consuming. Clinical assessments of function, such as timed functional tests, fail to capture the nuances of full-body biomechanics. OpenCap Monocular addresses the need for fast, scalable, and accurate tools to quantify whole-body motion. This software enables 3D biomechanical assessments in large-scale, ecologically valid studies and supports integration into routine clinical practice. (B.) OpenCap Monocular enables 3D assessment of kinematics and kinetics with a single smartphone. The pipeline is freely available through our mobile and web applications and secure cloud processing infrastructure.
  • Figure 2: OpenCap Monocular Algorithm. OpenCap Monocular estimates 3D global kinematics and kinetics from a single, static smartphone video. (1) Computer vision models, ViTPose vitpose and WHAM WHAM, estimate 2D keypoints and an initial 3D human global pose, represented by a sequence of SMPL model parameters loper. (2) This initial pose sequence (top, red skeleton) often contains physical inaccuracies like translational drift and foot-floor penetration. To correct this, we apply a pose-refinement optimization that minimizes reprojection error, foot sliding/penetration, and excessive joint velocity. The output is a more physically plausible, optimized pose sequence (bottom, green skeleton). (3) A set of virtual skin markers is extracted from the vertices of the refined SMPL mesh and (4) tracked with OpenSim Inverse Kinematics opensim to obtain 3D joint kinematics. (5) Physics-based and machine learning algorithms are used to estimate kinetics (e.g., ground reaction and muscle forces) from the monocular kinematics, without the need for force plates tan2025gaitdynamicsopencapMiller2025.
  • Figure 3: Kinematic Accuracy. The mean (bar) and standard deviation (error bar) of mean absolute errors (MAE) in kinematics across activities (STS stands for sit-to-stand), compared to marker-based motion capture. * indicates $p < 0.05$. Compared to the computer vision baseline model, OpenCap Monocular demonstrated (A) 48% lower errors across 18 rotational degrees of freedom ($p = 0.036$) and (B) 69% lower errors across three pelvic translational degrees of freedom ($p < 0.001$), averaged across activities.
  • Figure 4: Impact of Pose Refinement on Translational Drift. The mean (line) and standard deviation (shading) of pelvis translational drift (Euclidean distance between the estimated pelvis position and marker-based motion capture) over five sit-to-stand repetitions. All pelvis origins were aligned at the initial time point. OpenCap Monocular drifted an order of magnitude less than the computer vision plus inverse kinematics baseline (CV+IK) but still more than the two-camera OpenCap approach, which can compute depth analytically. Representative skeletal kinematics are shown during the first and fifth repetitions for marker-based motion capture (white), OpenCap Monocular (blue), and CV+IK (red).
  • Figure 5: Ground Reaction Force Accuracy. The mean (bar) and standard deviation (error bar) of mean absolute errors (MAE) in ground reaction forces during walking compared to force plates. OpenCap Monocular kinematics coupled with the GaitDynamics tan2025gaitdynamics machine learning (ML) model estimated ground reaction forces more accurately (vertical: $p=0.002$; mediolateral: $p=0.002$; anteroposterior: $p=0.065$) than the baseline computer vision model (CV+IK) and GaitDynamics (* indicates $p < 0.05$). We also compare to forces derived from two-camera OpenCap kinematics with either physics-based simulation opencap or GaitDynamics (ML). Interestingly, OpenCap Monocular + ML yielded slightly lower vertical force errors than either two-camera approach, despite using only one camera, potentially due to improved vertical center-of-mass kinematics from OpenCap Monocular's pose refinement step.
  • ...and 2 more figures