Table of Contents
Fetching ...

BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos

Farnoosh Koleini, Muhammad Usama Saleem, Pu Wang, Hongfei Xue, Ahmed Helmy, Abbey Fenwick

TL;DR

BioPose presents a cohesive framework for biomechanically accurate 3D pose estimation from monocular video by marrying detailed mesh recovery with physics-informed pose regression. It introduces MQ-HMR to produce precise 3D meshes and virtual markers, NeurIK to infer biomechanically valid poses under a OpenSim-based skeleton, and a 2D pose-informed refinement that aligns 3D outputs with 2D cues during inference. The paper demonstrates that MQ-HMR achieves superior mesh reconstruction and that NeurIK delivers improved biomechanical accuracy across multiple datasets, with ablations revealing the importance of tokenization, multi-scale features, and 2D guidance. Collectively, BioPose narrows the gap between marker-based biomechanics and monocular vision, enabling practical biomechanical analysis for clinical, sports, and robotics applications.

Abstract

Recent advancements in 3D human pose estimation from single-camera images and videos have relied on parametric models, like SMPL. However, these models oversimplify anatomical structures, limiting their accuracy in capturing true joint locations and movements, which reduces their applicability in biomechanics, healthcare, and robotics. Biomechanically accurate pose estimation, on the other hand, typically requires costly marker-based motion capture systems and optimization techniques in specialized labs. To bridge this gap, we propose BioPose, a novel learning-based framework for predicting biomechanically accurate 3D human pose directly from monocular videos. BioPose includes three key components: a Multi-Query Human Mesh Recovery model (MQ-HMR), a Neural Inverse Kinematics (NeurIK) model, and a 2D-informed pose refinement technique. MQ-HMR leverages a multi-query deformable transformer to extract multi-scale fine-grained image features, enabling precise human mesh recovery. NeurIK treats the mesh vertices as virtual markers, applying a spatial-temporal network to regress biomechanically accurate 3D poses under anatomical constraints. To further improve 3D pose estimations, a 2D-informed refinement step optimizes the query tokens during inference by aligning the 3D structure with 2D pose observations. Experiments on benchmark datasets demonstrate that BioPose significantly outperforms state-of-the-art methods. Project website: \url{https://m-usamasaleem.github.io/publication/BioPose/BioPose.html}.

BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos

TL;DR

BioPose presents a cohesive framework for biomechanically accurate 3D pose estimation from monocular video by marrying detailed mesh recovery with physics-informed pose regression. It introduces MQ-HMR to produce precise 3D meshes and virtual markers, NeurIK to infer biomechanically valid poses under a OpenSim-based skeleton, and a 2D pose-informed refinement that aligns 3D outputs with 2D cues during inference. The paper demonstrates that MQ-HMR achieves superior mesh reconstruction and that NeurIK delivers improved biomechanical accuracy across multiple datasets, with ablations revealing the importance of tokenization, multi-scale features, and 2D guidance. Collectively, BioPose narrows the gap between marker-based biomechanics and monocular vision, enabling practical biomechanical analysis for clinical, sports, and robotics applications.

Abstract

Recent advancements in 3D human pose estimation from single-camera images and videos have relied on parametric models, like SMPL. However, these models oversimplify anatomical structures, limiting their accuracy in capturing true joint locations and movements, which reduces their applicability in biomechanics, healthcare, and robotics. Biomechanically accurate pose estimation, on the other hand, typically requires costly marker-based motion capture systems and optimization techniques in specialized labs. To bridge this gap, we propose BioPose, a novel learning-based framework for predicting biomechanically accurate 3D human pose directly from monocular videos. BioPose includes three key components: a Multi-Query Human Mesh Recovery model (MQ-HMR), a Neural Inverse Kinematics (NeurIK) model, and a 2D-informed pose refinement technique. MQ-HMR leverages a multi-query deformable transformer to extract multi-scale fine-grained image features, enabling precise human mesh recovery. NeurIK treats the mesh vertices as virtual markers, applying a spatial-temporal network to regress biomechanically accurate 3D poses under anatomical constraints. To further improve 3D pose estimations, a 2D-informed refinement step optimizes the query tokens during inference by aligning the 3D structure with 2D pose observations. Experiments on benchmark datasets demonstrate that BioPose significantly outperforms state-of-the-art methods. Project website: \url{https://m-usamasaleem.github.io/publication/BioPose/BioPose.html}.
Paper Structure (36 sections, 8 equations, 8 figures, 10 tables)

This paper contains 36 sections, 8 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: BioPose is a framework for biomechanically accurate 3D pose estimation from monocular videos. It first integrates Multi-Query Human Mesh Recovery (MQ-HMR), which leverages multi-scale image features for precise 3D mesh reconstruction, secondly with Neural Inverse Kinematics (NeurIK), ensuring biomechanical constraints for anatomically valid 3D poses.
  • Figure 1: Comparison of state-of-the-art methods, HMR2.0 goel2023humans and TokenHMR dwivedi2024tokenhmr, which use vision transformers for 3D human mesh recovery from a single image. Red circles highlight errors in these methods when dealing with complex or ambiguous poses. In contrast, our MQ-HMR method addresses these challenges by incorporating a multi-query deformable transformer, leveraging multi-scale feature maps and a deformable attention mechanism to deliver more accurate and anatomically consistent pose estimations, even in difficult scenarios.
  • Figure 2: Left: Biomechanical skeleton model has anatomical details with accurate joint locations and bone orientations. Right: SMPL body model has deformable 3D body surface that includes an approximate skeleton geometry with inaccurate joint location and bone orientations.
  • Figure 2: Qualitative results of our approach on challenging poses from the LSP johnson2011learning dataset.
  • Figure 3: Overview of BioPose, comprising two key components: (1) the MQ-HMR model, which leverages a multi-query deformable transformer decoder to extract multi-scale image features from a vision transformer, enabling precise 3D mesh recovery, and (2) the NeurIK model, which uses the mesh vertices as virtual markers and applies a spatio-temporal network to infer biomechanically accurate 3D poses while maintaining anatomical constraints.
  • ...and 3 more figures