KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
Fengyuan Yang, Kerui Gu, Angela Yao
TL;DR
KITRO tackles depth ambiguity and gradient conflicts in monocular 3D human mesh refinement by explicitly modeling depth with a closed-form bone-direction calculation and tracing bone-level hypotheses along the kinematic-tree via a decision tree. It computes two depth options per bone from 2D keypoints and bone length, then selects the most consistent full-body pose by maximizing the product of edge weights derived from cosine similarities to the original HMR pose. The method refines camera, shape, and pose in a plug-and-play manner and demonstrates substantial gains in MPJPE, PA-MPJPE, and PVE across 3DPW and Human3.6M, while preserving or improving 2D keypoint fit. KITRO achieves faster, more stable refinements with improved proximal and distal joint accuracy, supported by ablations, cross-base-model results, and supplementary analyses.
Abstract
2D keypoints are commonly used as an additional cue to refine estimated 3D human meshes. Current methods optimize the pose and shape parameters with a reprojection loss on the provided 2D keypoints. Such an approach, while simple and intuitive, has limited effectiveness because the optimal solution is hard to find in ambiguous parameter space and may sacrifice depth. Additionally, divergent gradients from distal joints complicate and deviate the refinement of proximal joints in the kinematic chain. To address these, we introduce Kinematic-Tree Rotation (KITRO), a novel mesh refinement strategy that explicitly models depth and human kinematic-tree structure. KITRO treats refinement from a bone-wise perspective. Unlike previous methods which perform gradient-based optimizations, our method calculates bone directions in closed form. By accounting for the 2D pose, bone length, and parent joint's depth, the calculation results in two possible directions for each child joint. We then use a decision tree to trace binary choices for all bones along the human skeleton's kinematic-tree to select the most probable hypothesis. Our experiments across various datasets and baseline models demonstrate that KITRO significantly improves 3D joint estimation accuracy and achieves an ideal 2D fit simultaneously. Our code available at: https://github.com/MartaYang/KITRO.
