3D Human Mesh Estimation from Virtual Markers
Xiaoxuan Ma, Jiajun Su, Chunyu Wang, Wentao Zhu, Yizhou Wang
TL;DR
This work tackles 3D human mesh estimation from monocular images by addressing the loss of body-shape information in skeleton-based intermediates. It introduces virtual markers, a learnable set of $K=64$ markers learned via archetypal analysis from mocap data, enabling reconstruction of full meshes through $M = P A$ after estimating 3D marker positions $P$ from volumetric heatmaps and updating the interpolation matrix $A$ with marker confidences ($M = P A$). The model is trained with a combination of losses including $L_{vm}$, $L_{conf}$, and $L_{mesh}$ (comprising vertex, pose, normal, and edge terms) and benefits from mix-training across diverse datasets. Empirically, the method achieves state-of-the-art performance on H3.6M, 3DPW, and SURREAL, reducing shape ambiguities and handling occlusion more robustly than skeleton- or full-vertex-based approaches, with practical implications for wild-image mocap and realistic avatar generation.
Abstract
Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://github.com/ShirleyMaxx/VirtualMarker
