Table of Contents
Fetching ...

MonoMSK: Monocular 3D Musculoskeletal Dynamics Estimation

Farnoosh Koleini, Hongfei Xue, Ahmed Helmy, Pu Wang

TL;DR

MonoMSK tackles the challenge of recovering biomechanically realistic 3D human motion from monocular video by marrying learning-based inverse dynamics with a differentiable musculoskeletal forward simulator. The system uses a five-component pipeline (HMR-based virtual markers, Inverse Kinematics Transformer, Inverse Dynamics Transformer, differentiable Forward Dynamics, and Anatomical Forward Kinematic layers) within a physics-regulated loop, augmented by forward–inverse consistency losses. Ground-truth dynamics are obtained via optimal-control simulations in biomechanics engines to supervise the learned torques and contact forces, and gradients flow through the ODE solver for end-to-end training. Across BML-MoVi, BEDLAM, and OpenCap, MonoMSK achieves state-of-the-art kinematic accuracy and, for the first time, precise monocular kinetics estimation, enabling scalable, biomechanics-aware motion analysis for clinical, sports, and human–robot interaction applications.

Abstract

Reconstructing biomechanically realistic 3D human motion - recovering both kinematics (motion) and kinetics (forces) - is a critical challenge. While marker-based systems are lab-bound and slow, popular monocular methods use oversimplified, anatomically inaccurate models (e.g., SMPL) and ignore physics, fundamentally limiting their biomechanical fidelity. In this work, we introduce MonoMSK, a hybrid framework that bridges data-driven learning and physics-based simulation for biomechanically realistic 3D human motion estimation from monocular video. MonoMSK jointly recovers both kinematics (motions) and kinetics (forces and torques) through an anatomically accurate musculoskeletal model. By integrating transformer-based inverse dynamics with differentiable forward kinematics and dynamics layers governed by ODE-based simulation, MonoMSK establishes a physics-regulated inverse-forward loop that enforces biomechanical causality and physical plausibility. A novel forward-inverse consistency loss further aligns motion reconstruction with the underlying kinetic reasoning. Experiments on BML-MoVi, BEDLAM, and OpenCap show that MonoMSK significantly outperforms state-of-the-art methods in kinematic accuracy, while for the first time enabling precise monocular kinetics estimation.

MonoMSK: Monocular 3D Musculoskeletal Dynamics Estimation

TL;DR

MonoMSK tackles the challenge of recovering biomechanically realistic 3D human motion from monocular video by marrying learning-based inverse dynamics with a differentiable musculoskeletal forward simulator. The system uses a five-component pipeline (HMR-based virtual markers, Inverse Kinematics Transformer, Inverse Dynamics Transformer, differentiable Forward Dynamics, and Anatomical Forward Kinematic layers) within a physics-regulated loop, augmented by forward–inverse consistency losses. Ground-truth dynamics are obtained via optimal-control simulations in biomechanics engines to supervise the learned torques and contact forces, and gradients flow through the ODE solver for end-to-end training. Across BML-MoVi, BEDLAM, and OpenCap, MonoMSK achieves state-of-the-art kinematic accuracy and, for the first time, precise monocular kinetics estimation, enabling scalable, biomechanics-aware motion analysis for clinical, sports, and human–robot interaction applications.

Abstract

Reconstructing biomechanically realistic 3D human motion - recovering both kinematics (motion) and kinetics (forces) - is a critical challenge. While marker-based systems are lab-bound and slow, popular monocular methods use oversimplified, anatomically inaccurate models (e.g., SMPL) and ignore physics, fundamentally limiting their biomechanical fidelity. In this work, we introduce MonoMSK, a hybrid framework that bridges data-driven learning and physics-based simulation for biomechanically realistic 3D human motion estimation from monocular video. MonoMSK jointly recovers both kinematics (motions) and kinetics (forces and torques) through an anatomically accurate musculoskeletal model. By integrating transformer-based inverse dynamics with differentiable forward kinematics and dynamics layers governed by ODE-based simulation, MonoMSK establishes a physics-regulated inverse-forward loop that enforces biomechanical causality and physical plausibility. A novel forward-inverse consistency loss further aligns motion reconstruction with the underlying kinetic reasoning. Experiments on BML-MoVi, BEDLAM, and OpenCap show that MonoMSK significantly outperforms state-of-the-art methods in kinematic accuracy, while for the first time enabling precise monocular kinetics estimation.

Paper Structure

This paper contains 16 sections, 19 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: MonoMSK is a framework for physically grounded 3D human motion estimation from monocular videos. It couples a transformer-based Inverse Dynamics Transformer (IDT) that infers joint torques and ground reaction forces with a differentiable Forward Dynamics (FD) ODE solver that integrates these forces over time to produce biomechanically consistent motion.
  • Figure 2: Overview of the MonoMSK pipeline. A monocular video is processed by a pretrained Human Mesh Recovery (HMR) model to obtain 3D meshes and virtual markers. The Inverse Kinematics Transformer (IKT) converts these markers into anatomically constrained musculoskeletal joint states $\mathbf{q}$. The Inverse Dynamics Transformer (IDT) infers the latent dynamic quantities, internal torques $\boldsymbol{\tau}$ and external ground-reaction forces $\boldsymbol{\lambda}$. A differentiable Forward Dynamics (FD)- ODE solver ODE solver integrates these forces through the Euler–Lagrange MSK dynamics to produce physically coherent future motion.
  • Figure 3: Musculoskeletal (MSK) body model with anatomically precise joint positions, bone orientations, and muscle geometry (red). Pink spheres indicate virtual model markers attached to bone segments for accurate biomechanical tracking. The zoomed region illustrates detailed muscle–joint structure around the knee.