MonoMSK: Monocular 3D Musculoskeletal Dynamics Estimation
Farnoosh Koleini, Hongfei Xue, Ahmed Helmy, Pu Wang
TL;DR
MonoMSK tackles the challenge of recovering biomechanically realistic 3D human motion from monocular video by marrying learning-based inverse dynamics with a differentiable musculoskeletal forward simulator. The system uses a five-component pipeline (HMR-based virtual markers, Inverse Kinematics Transformer, Inverse Dynamics Transformer, differentiable Forward Dynamics, and Anatomical Forward Kinematic layers) within a physics-regulated loop, augmented by forward–inverse consistency losses. Ground-truth dynamics are obtained via optimal-control simulations in biomechanics engines to supervise the learned torques and contact forces, and gradients flow through the ODE solver for end-to-end training. Across BML-MoVi, BEDLAM, and OpenCap, MonoMSK achieves state-of-the-art kinematic accuracy and, for the first time, precise monocular kinetics estimation, enabling scalable, biomechanics-aware motion analysis for clinical, sports, and human–robot interaction applications.
Abstract
Reconstructing biomechanically realistic 3D human motion - recovering both kinematics (motion) and kinetics (forces) - is a critical challenge. While marker-based systems are lab-bound and slow, popular monocular methods use oversimplified, anatomically inaccurate models (e.g., SMPL) and ignore physics, fundamentally limiting their biomechanical fidelity. In this work, we introduce MonoMSK, a hybrid framework that bridges data-driven learning and physics-based simulation for biomechanically realistic 3D human motion estimation from monocular video. MonoMSK jointly recovers both kinematics (motions) and kinetics (forces and torques) through an anatomically accurate musculoskeletal model. By integrating transformer-based inverse dynamics with differentiable forward kinematics and dynamics layers governed by ODE-based simulation, MonoMSK establishes a physics-regulated inverse-forward loop that enforces biomechanical causality and physical plausibility. A novel forward-inverse consistency loss further aligns motion reconstruction with the underlying kinetic reasoning. Experiments on BML-MoVi, BEDLAM, and OpenCap show that MonoMSK significantly outperforms state-of-the-art methods in kinematic accuracy, while for the first time enabling precise monocular kinetics estimation.
