SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery
Da Li, Jiping Jin, Xuanlong Yu, Wei Liu, Xiaodong Cun, Kai Chen, Rui Fan, Jiangang Kong, Xi Shen
TL;DR
SKEL-CF introduces a coarse-to-fine transformer framework for estimating anatomically constrained SKEL parameters from a single image, addressing biomechanical realism in 3D human mesh recovery. By constructing 4DHuman-SKEL and incorporating an explicit camera model, SKEL-CF achieves state-of-the-art performance among SKEL-based methods and remains competitive with leading SMPL-based approaches, especially on challenging MOYO data. The paper demonstrates substantial quantitative gains (e.g., MPJPE and PA-MPJPE) and improved visual fidelity of both skeletal and surface reconstructions, reinforced by ablations and per-layer attention analyses. This work advances the bridge between computer vision and biomechanics by delivering a scalable, anatomically faithful pipeline for motion analysis and biomechanics applications.
Abstract
Parametric 3D human models such as SMPL have driven significant advances in human pose and shape estimation, yet their simplified kinematics limit biomechanical realism. The recently proposed SKEL model addresses this limitation by re-rigging SMPL with an anatomically accurate skeleton. However, estimating SKEL parameters directly remains challenging due to limited training data, perspective ambiguities, and the inherent complexity of human articulation. We introduce SKEL-CF, a coarse-to-fine framework for SKEL parameter estimation. SKEL-CF employs a transformer-based encoder-decoder architecture, where the encoder predicts coarse camera and SKEL parameters, and the decoder progressively refines them in successive layers. To ensure anatomically consistent supervision, we convert the existing SMPL-based dataset 4DHuman into a SKEL-aligned version, 4DHuman-SKEL, providing high-quality training data for SKEL estimation. In addition, to mitigate depth and scale ambiguities, we explicitly incorporate camera modeling into the SKEL-CF pipeline and demonstrate its importance across diverse viewpoints. Extensive experiments validate the effectiveness of the proposed design. On the challenging MOYO dataset, SKEL-CF achieves 85.0 MPJPE / 51.4 PA-MPJPE, significantly outperforming the previous SKEL-based state-of-the-art HSMR (104.5 / 79.6). These results establish SKEL-CF as a scalable and anatomically faithful framework for human motion analysis, bridging the gap between computer vision and biomechanics. Our implementation is available on the project page: https://pokerman8.github.io/SKEL-CF/.
